Method for assessing risk of breast cancer

文档序号:1191681 发布日期:2020-08-28 浏览:11次 中文

阅读说明:本技术 评估患乳腺癌的风险的方法 (Method for assessing risk of breast cancer ) 是由 理查德·奥尔曼 吉莉安·迪特 约翰·霍珀 于 2018-10-12 设计创作,主要内容包括:用于评估人类女性受试者患乳腺癌的风险的方法和系统。具体而言,本公开涉及将第一临床风险评估、至少基于乳腺密度的第二临床评估与遗传风险评估相结合以获得改进的风险分析。(Methods and systems for assessing the risk of a human female subject for breast cancer. In particular, the present disclosure relates to combining a first clinical risk assessment, at least a second clinical assessment based on breast density, with a genetic risk assessment to obtain an improved risk analysis.)

1. A method for assessing the risk of a human female subject for developing breast cancer, the method comprising:

performing a first clinical risk assessment on the female subject;

performing a second clinical risk assessment on the female subject, wherein the second clinical assessment is based at least on breast density;

performing a genetic risk assessment on the female subject, wherein the genetic risk assessment involves detecting the presence of at least two polymorphisms known to be associated with breast cancer in a biological sample derived from the female subject; and

combining the first clinical risk assessment, the second clinical risk assessment, and the genetic risk assessment to obtain an overall risk of the human female subject to suffer from breast cancer.

2. The method of claim 1, wherein the second clinical risk assessment is based solely on breast density.

3. The method of claim 1 or claim 2, wherein performing a first clinical risk assessment uses a model selected from the group consisting of: the Gail model, Claus table, BOADICEA, Jonker model, Claus extended Formula, Tyrer-Cuzick model, and Manchester scoring system.

4. The method of claim 3, wherein the first clinical risk assessment is obtained using a Gail model or a BOADICEA or a Tyrer-Cuzick model.

5. The method of any one of claims 1-4, wherein performing the first clinical risk assessment comprises obtaining information from a female regarding one or more of: history of breast cancer, ductal carcinoma or lobular carcinoma, age at first menstrual period, age of her first delivery, family history of breast cancer, previous breast biopsy results, and race/ethnicity.

6. The method of claim 5, wherein the first clinical risk assessment is based only on two or all of female subject's age, breast cancer family history, and ethnic group.

7. The method of claim 5 or claim 6, wherein the first clinical risk assessment is based solely on the age of the female subject and the family history of breast cancer.

8. The method of any one of claims 1-7, comprising detecting the presence of at least 3, 4, 5,6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, 80, 100, 120, 140, 160, 180, or 200 polymorphisms known to be associated with breast cancer.

9. The method of any one of claims 1-8, wherein the polymorphism is selected from table 12 or is a polymorphism in linkage disequilibrium with one or more thereof.

10. The method of any one of claims 1-9, comprising detecting at least 50, 80, 100, 150 polymorphisms shown in table 12, or polymorphisms in linkage disequilibrium with one or more thereof.

11. The method of any one of claims 1-10, comprising detecting all 203 polymorphisms shown in table 12, or a polymorphism in linkage disequilibrium with one or more thereof.

12. The method of any one of claims 1-8, wherein the polymorphism is selected from table 6 or is a polymorphism in linkage disequilibrium with one or more thereof.

13. The method of any one of claims 1-8, comprising detecting at least 72 polymorphisms associated with breast cancer, wherein at least 67 polymorphisms are selected from table 7, or are polymorphisms in linkage disequilibrium with one or more thereof, and the remaining polymorphisms are selected from table 6, or are polymorphisms in linkage disequilibrium with one or more thereof.

14. The method of any one of claims 1-8, wherein when the female subject is caucasian, the method comprises detecting at least 72 polymorphisms shown in Table 9, or polymorphisms in linkage disequilibrium with one or more thereof.

15. A method according to claim 14, wherein when the female subject is caucasian, the method comprises detecting all 77 polymorphisms shown in table 9, or a polymorphism in linkage disequilibrium with one or more thereof.

16. The method of any one of claims 1-8, wherein when the female subject is of black ethnic species or african americans, the method comprises detecting at least 74 polymorphisms shown in table 10, or polymorphisms in linkage disequilibrium with one or more thereof.

17. The method of any one of claims 1-8, wherein when the female subject is of black ethnic species or african americans, the method comprises detecting all 74 polymorphisms shown in table 13, or polymorphisms in linkage disequilibrium with one or more thereof.

18. The method according to any one of claims 1-8, wherein when the female subject is of Hispanic descent, the method comprises detecting at least 71 polymorphisms shown in Table 11, or polymorphisms in linkage disequilibrium with one or more thereof.

19. The method according to any one of claims 1 to 8, wherein when the female subject is of Hispanic descent, the method comprises detecting all 71 polymorphisms shown in Table 14, or polymorphisms in linkage disequilibrium with one or more thereof.

20. The method of any one of claims 1-19, wherein combining the first clinical risk assessment, second clinical risk assessment, and genetic risk assessment comprises multiplying the risk assessments.

21. The method of any one of claims 1-15 or 20, wherein the female is caucasian.

22. The method of any one of claims 1-21, wherein if the subject is determined to be at risk for breast cancer, the subject is more likely to be responsive to estrogen inhibition than non-responsive.

23. The method of any one of claims 1-22, wherein the breast cancer is estrogen receptor positive or estrogen receptor negative.

24. A method for determining the need for a routine diagnostic test for breast cancer in a human female subject, the method comprising assessing the overall risk of the subject for developing breast cancer using the method of any one of claims 1 to 23.

25. The method of claim 24, wherein a risk score of greater than about 20% lifetime risk indicates that the subject should be enrolled in the screening breast MRIc and mammography procedures.

26. A method of screening a human female subject for breast cancer, the method comprising assessing the overall risk of the subject for developing breast cancer using the method of any one of claims 1-23 and routinely screening the subject for breast cancer if the subject is assessed as having a risk of developing breast cancer.

27. A method for determining the need of a human female subject for prophylactic anti-breast cancer therapy, the method comprising assessing the overall risk of the subject for developing breast cancer using the method of any one of claims 1-23.

28. The method of claim 27, wherein a risk score above about 1.66% of 5-year risk indicates that estrogen receptor therapy should be provided to the subject.

29. A method for preventing or reducing the risk of breast cancer in a human female subject, the method comprising assessing the overall risk of the subject for having breast cancer using the method according to any one of claims 1 to 23, and administering an anti-breast cancer therapy to the subject if they are assessed as having a risk of having breast cancer.

30. The method of claim 29, wherein the therapy inhibits estrogen.

31. An anti-breast cancer therapy for preventing breast cancer in a human female subject at risk for breast cancer, wherein the subject is assessed as being at risk for breast cancer according to the method of any one of claims 1-23.

32. A method for stratifying a group of human female subjects undergoing a clinical trial for a candidate therapy, the method comprising assessing the overall risk of the subject for developing breast cancer using the method of any one of claims 1 to 23, and using the results of the assessment to select subjects more likely to respond to the therapy.

33. A computer-implemented method for assessing a risk of a human female subject for developing breast cancer, the method operable in a computing system comprising a processor and memory, the method comprising:

receiving first clinical risk data, second clinical risk data, and genetic risk data for the female subject, wherein the first clinical risk data, second clinical risk data, and genetic risk data are obtained by the method of any one of claims 1-23;

processing the data to combine clinical risk data with genetic risk data to obtain a risk of the human female subject for breast cancer;

outputting a risk of the human female subject to suffer from breast cancer.

34. A system for assessing the risk of a human female subject for developing breast cancer, the system comprising:

system instructions for performing a first clinical risk assessment, a second clinical risk assessment and a genetic risk assessment on a female subject according to any one of claims 1-23; and

combining the first clinical risk assessment, the second clinical risk assessment, and the genetic risk assessment to obtain a systematic description of the risk of a human female subject to suffer from breast cancer.

Technical Field

The present disclosure relates to methods and systems for assessing the risk of a human female subject for developing breast cancer. In particular, the present disclosure relates to combining a first clinical risk assessment, at least a second clinical assessment based on breast density, with a genetic risk assessment to improve risk analysis.

Background

It is estimated that about one-eighth of women will have breast cancer in their lives in the united states. Over 23 million women in 2013 are expected to be diagnosed with invasive Breast Cancer, and nearly 4 million will die from the disease (ACS Breast Cancer Facts & Figures 2013-14). Therefore, there are pressing reasons to predict which women will suffer from the disease and to take measures to prevent it.

Extensive research has focused on phenotypic risk factors including age, family history, reproductive history, and benign breast disease. Compiling various combinations of these risk factors into the two most commonly used risk prediction algorithms; the Gail model (adapted to the general population) (also known as the breast cancer risk assessment tool: BCRAT) and the Tyrer-Cuzick model (adapted to women with a strong family history).

These risk prediction algorithms rely heavily on self-reported clinical information, usually obtained through questionnaires. In some cases, no relevant clinical information is provided. This is unexpected because some problems rely on memory decades ago (first menses), while others require patient medical levels and/or actual pathology reports (atypical hyperplasia). Furthermore, this calls into question the accuracy of the data entered into the algorithm for those who enter answers other than 'unknown'. For example, the presence or absence of dysplasia is an important factor in the risk assessment of breast cancer (relative risk > 4.0).

Recently, commercial tests for assessing the risk of developing breast cancer have discussed predicting the risk of breast cancer by combining a clinical risk score with a genetic risk score. However, the first clinical risk assessment component of these tests suffers from the above-described limitations of self-reporting clinical information. Accordingly, there is a need in the art for improved breast cancer risk assessment tests.

Disclosure of Invention

The present inventors have found that a breast cancer risk model that combines a first clinical risk assessment, at least a second clinical risk assessment based on breast density and a genetic risk assessment provides an improved risk discrimination method for assessing the risk of a subject for breast cancer.

In one aspect, the present invention provides a method for assessing the risk of a human female subject for developing breast cancer, comprising:

performing a first clinical risk assessment on the female subject;

performing a second clinical risk assessment on the female subject, wherein the second clinical assessment is based at least on breast density;

performing a genetic risk assessment on the female subject, wherein the genetic risk assessment involves detecting the presence of at least two polymorphisms known to be associated with breast cancer in a biological sample derived from the female subject; and

combining the first clinical risk assessment, the second clinical risk assessment, and the genetic risk assessment to obtain an overall risk of the human female subject to suffer from breast cancer.

In one embodiment, the second clinical risk assessment is based solely on breast density.

In one embodiment, the first clinical risk assessment is performed using a model selected from the group consisting of: the Gail model, Claus table, BOADICEA, Jonker model, Claus Extended Formula, Tyrer-Cuzick model, and Manchester scoring system. In some embodiments, the first clinical risk assessment is obtained using a Gail model or a BOADICEA or a Tyrer-Cuzick model.

In another embodiment, the first clinical risk assessment comprises obtaining information from the female regarding one or more of: history of breast cancer, ductal carcinoma or lobular carcinoma, age at first menstrual period, age of her first delivery, family history of breast cancer, previous breast biopsy results, and race/ethnicity. In one embodiment, the first clinical risk assessment is based only on two or all of the female subject's age, breast cancer family history, and ethnic group. In one embodiment, the first clinical risk assessment is based solely on the age of the female subject and the family history of breast cancer.

In one embodiment, the methods described herein comprise detecting the presence of at least 3, 4, 5,6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, 80, 100, 120, 140, 160, 180, or 200 polymorphisms known to be associated with breast cancer.

In one embodiment, the polymorphism is selected from table 12 or a polymorphism in linkage disequilibrium with one or more thereof.

In one embodiment, the methods described herein comprise detecting at least 50, 80, 100, 150 polymorphisms shown in table 12, or polymorphisms in linkage disequilibrium with one or more thereof. In one embodiment, the methods described herein comprise detecting all 203 polymorphisms shown in table 12, or polymorphisms in linkage disequilibrium with one or more thereof.

In one embodiment, the polymorphism is selected from table 6 or a polymorphism in linkage disequilibrium with one or more thereof.

In another embodiment, the methods described herein comprise detecting at least 72 polymorphisms associated with breast cancer, wherein at least 67 polymorphisms are selected from table 7, or are polymorphisms in linkage disequilibrium with one or more thereof, and the remaining polymorphisms are selected from table 6, or are polymorphisms in linkage disequilibrium with one or more thereof.

In one embodiment, when the female subject is caucasian, the methods described herein comprise detecting at least 72 polymorphisms shown in table 9, or polymorphisms in linkage disequilibrium with one or more thereof. In one embodiment, when the female subject is caucasian, the methods described herein comprise detecting all 77 polymorphisms shown in table 9, or polymorphisms in linkage disequilibrium with one or more thereof.

In one embodiment, when the female subject is of black race or african americans, the methods described herein comprise detecting at least 74 of the polymorphisms shown in table 10, or polymorphisms in linkage disequilibrium with one or more thereof. In one embodiment, when the female subject is of black race or african americans, the methods described herein comprise detecting all 74 polymorphisms shown in table 13, or polymorphisms in linkage disequilibrium with one or more thereof.

In one embodiment, when the female subject is hispanic, the methods described herein comprise detecting at least 71 polymorphisms shown in table 11, or polymorphisms in linkage disequilibrium with one or more thereof. In one embodiment, when the female subject is hispanic, the methods described herein comprise detecting all 71 polymorphisms shown in table 14, or polymorphisms in linkage disequilibrium with one or more thereof.

In some embodiments, combining the first clinical risk assessment, the second clinical risk assessment, and the genetic risk assessment comprises multiplying the risk assessments.

In some embodiments, the female is caucasian

In another embodiment, if the subject is determined to be at risk for breast cancer, the subject is more likely to be responsive to estrogen inhibition than non-responsive.

In one embodiment, the breast cancer is estrogen receptor positive or estrogen receptor negative.

In one embodiment, the overall risk of the subject for developing breast cancer is an absolute risk. Absolute risk is the risk for a particular subject, not the relative risk of the population. An absolute risk can be described as a numerical probability that a human female subject suffers from breast cancer over a specified period of time (e.g., 5, 10, 15, 20, or more years) or for the remaining life of the subject.

In another aspect, the present invention provides a method for determining the need for a routine diagnostic test for breast cancer in a human female subject, said method comprising assessing the overall risk of the subject for developing breast cancer using the methods described herein.

In one embodiment, a risk score of greater than about 20% lifetime risk indicates that the subject should be enrolled in the screening breast MRIc and mammography procedures.

In another aspect, the invention provides a method of screening for breast cancer in a human female subject, the method comprising assessing the overall risk of the subject for having breast cancer using the methods described herein, and routine screening for breast cancer in the subject if they are assessed as having a risk of having breast cancer.

In another aspect, the invention provides a method for determining the need for a prophylactic anti-breast cancer therapy in a human female subject, the method comprising assessing the overall risk of the subject for developing breast cancer using the methods described herein.

In one embodiment, a risk score of greater than about 1.66% of the 5-year risk indicates that the subject should be provided estrogen receptor therapy.

In another aspect, the invention provides a method for preventing or reducing the risk of a human female subject to suffer from breast cancer, the method comprising assessing the overall risk of the subject to suffer from breast cancer using the methods described herein, and administering an anti-breast cancer therapy to the subject if they are assessed as having a risk of suffering from breast cancer.

In one embodiment, the therapy inhibits estrogen.

In another aspect, the invention provides an anti-breast cancer therapy for preventing breast cancer in a human female subject at risk for breast cancer, wherein the subject is assessed as being at risk for breast cancer according to the methods described herein.

In another aspect, the invention provides a method of stratifying a group of human female subjects undergoing a clinical trial for a candidate therapy, the method comprising assessing the overall risk of an individual subject to suffer from breast cancer using the methods described herein, and using the results of the assessment to select subjects more likely to respond to the therapy.

In another aspect, the present invention provides a computer-implemented method for assessing a risk of a human female subject for breast cancer, the method operable in a computing system comprising a processor and a memory, the method comprising:

receiving first clinical risk data, second clinical risk data, and genetic risk data for the female subject, wherein the first clinical risk data, second clinical risk data, and genetic risk data are obtained by the methods described herein;

processing the data to combine clinical risk data with genetic risk data to obtain a risk of the human female subject for breast cancer;

outputting a risk of the human female subject to suffer from breast cancer.

In another aspect, the present invention provides a system for assessing the risk of a human female subject for breast cancer, the system comprising:

system instructions for performing a first clinical risk assessment, a second clinical risk assessment, and a genetic risk assessment on a female subject as described herein; and

combining the first clinical risk assessment, the second clinical risk assessment, and the genetic risk assessment to obtain a systematic description of the risk of a human female subject to suffer from breast cancer.

Unless specifically stated otherwise, any example herein should be considered to apply to any other example mutatis mutandis.

The scope of the present disclosure is not to be limited by the specific embodiments described herein, which are intended for illustrative purposes only. Functionally equivalent products, compositions, and methods are clearly within the scope of the present disclosure, as described herein.

Throughout this specification, unless specifically stated otherwise or the context requires otherwise, reference to a single step, composition of matter, group of steps or group of matter shall be taken to encompass one or more (i.e., one or more) of those steps, compositions of matter, groups of steps or groups of matter.

Throughout this specification the word "comprise", or variations such as "comprises" or "comprising", will be understood to imply the inclusion of a stated element, integer or step, or group of elements, integers or steps, but not the exclusion of any other element, integer or step, or group of elements, integers or steps.

The present disclosure is described below by way of the following non-limiting examples and with reference to the accompanying drawings.

Detailed Description

General techniques and definitions

Unless specifically defined otherwise, all technical and scientific terms used herein are to be considered as having the same meaning as commonly understood by one of ordinary skill in the art (e.g., oncology, breast cancer analysis, molecular genetics, risk assessment, and clinical research).

Unless otherwise indicated, the molecular and immunological techniques used in this disclosure are standard procedures well known to those skilled in the art. Such techniques are described in, for example, J.Perbal, A Practical Guide to Molecular Cloning, John Wiley and Sons (1984), J.Sambrook et al, Molecular Cloning: a Laboratory Manual, Cold spring Harbor Laboratory Press (1989), T.A.Brown (editor), Essential molecular biology: a Practical Approach, volumes 1 and 2, IRL Press (1991), d.m. glover and b.d. hames (editor), DNA Cloning: a Practical Approach, Vol.1-4, IRL Press (1995 and 1996) and F.M. Ausubel et al (editor), Current Protocols in Molecular Biology, Greene pub. associates and Wiley-Interscience (1988, including all updates up to now), Ed Harlow and David Lane (editor) Antibodies: a Laboratory Manual, Cold Spring harbor Laboratory, (1988) and J.E.Coligan et al (editor) Current Protocols in Immunology, John Wiley & Sons (including all updates up to now) are described and explained throughout the literature.

It is to be understood that this disclosure is not limited to particular embodiments, as such may, of course, vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting. As used in this specification and the appended claims, for example, the singular forms "a," "an," and "the" optionally include plural referents unless the content clearly dictates otherwise. Thus, for example, reference to "a probe" optionally includes a plurality of probe molecules; similarly, depending on the context, use of the term "a nucleic acid" optionally actually includes a plurality of copies of the nucleic acid molecule.

As used herein, unless stated to the contrary, the term "about" means +/-10%, more preferably +/-5%, more preferably +/-1% of the stated value.

The methods of the present disclosure can be used to assess the risk of breast cancer in a human female subject. The term "breast cancer" as used herein encompasses any type of breast cancer that a female subject may suffer from. For example, breast cancer can be characterized as Luminal A (ER + and/or PR +, HER2-, low Ki67), Luminal B (ER + and/or PR +, HER2+ (or HER2-, with a high Ki67), triple negative/basal-like (ER-, PR-, HER2-) or HER 2-type (ER-, PR-, HER2 +). A phenotype exhibiting a predisposition for breast cancer may indicate, for example, a higher likelihood that the cancer will develop in an individual with the phenotype, as compared to members of the relevant general population.

As used herein, "biological sample" refers to any sample comprising nucleic acids (particularly DNA) from or derived from a human patient, such as body fluids (blood, saliva, urine, etc.), biopsy, tissue and/or waste products from the patient. Thus, tissue biopsy, stool, sputum, saliva, blood, lymph fluid, and the like can be readily screened for polymorphisms, as can essentially any target tissue containing appropriate nucleic acids. In one embodiment, the biological sample is a buccal cell sample. These samples are typically obtained by standard medical laboratory methods by the patient after informed consent. The sample may be in a form taken directly from the patient, or may be at least partially processed (purified) to remove at least some non-nucleic acid material.

"polymorphism" is a variable locus; that is, within a population, the nucleotide sequence of a polymorphism has more than one pattern or allele. An example of a polymorphism is a "single nucleotide polymorphism," which is a polymorphism at a single nucleotide position in a genome (the nucleotide at a given position differs between individuals or populations). Other examples include a deletion or insertion of one or more base pairs at a polymorphic locus.

The term "SNP" or "single nucleotide polymorphism" as used herein refers to a genetic change between individuals; for example, a single nitrogenous base position in the DNA of a variable organism. As used herein, "SNPs" are a plurality of SNPs. Of course, where reference is made herein to DNA, such reference may include derivatives of the DNA, such as amplicons, RNA transcripts thereof, and the like.

The term "allele" refers to two or more different nucleotide sequences occurring or encoded at a particular locus or one of two or more different polypeptide sequences encoded by that locus. For example, a first allele may occur on one chromosome while a second allele occurs on a second homologous chromosome, e.g., on a different chromosome of a heterozygous individual, or between different homozygous or heterozygous individuals in a population. An allele is "positively" associated with a trait when the allele is associated with the trait, and when the presence of the allele is an indication that the trait or trait form will occur in an individual comprising the allele. An allele is "negatively" associated with a trait when the allele is associated with the trait, and the presence of the allele is an indication that the trait or trait form will not occur in the individual comprising the allele.

A marker polymorphism or allele is "associated" or "correlated" with a given phenotype (breast cancer susceptibility, etc.) when the marker polymorphism or allele can be statistically correlated (positive or negative) with the phenotype. Methods for determining whether polymorphisms or alleles are statistically correlated are known to those of skill in the art. That is, the specified polymorphism is more common in a case population (e.g., breast cancer patients) than in a control population (e.g., individuals not suffering from breast cancer). This association is generally considered causal in nature, but a simple genetic linkage to (association with) a locus related to a trait underlying the phenotype is not necessarily sufficient for the association/association to occur.

The phrase "linkage disequilibrium" (LD) is used to describe the statistical association between two adjacent polymorphic genotypes. In general, LD refers to the correlation between alleles of random gametes in two loci, assuming Hardy-Weinberg equilibrium (statistical independence) between gametes. The LD is quantified using the correlation parameter (D') of Lewontin or using the Pearson correlation coefficient (r) (devin and Risch, 1995). Two loci with LD values of 1 are referred to as complete LD. At the other extreme, two loci with LD values of 0 are referred to as linkage equilibrium. Linkage disequilibrium was calculated by estimating haplotype frequency using expectation-maximization (EM) (Slatkin and Excoffier, 1996). The LD value of adjacent genotypes/loci according to the present disclosure is selected to be more than 0.1, preferably more than 0.2, more preferably more than 0.5, more preferably more than 0.6, more preferably more than 0.7, preferably more than 0.8, more preferably more than 0.9, ideally about 1.0.

Another method by which one of skill in the art can readily identify polymorphisms in linkage disequilibrium with a polymorphism of the present disclosure is to determine the LOD scores for two loci. LOD represents the "log of chance" of whether two genes or a gene and a disease gene are likely to be located close to each other on a chromosome and therefore likely to be a statistical estimate of inheritance. A LOD score of about 2 to 3 or higher is generally understood to mean that the two genes are located close to each other on the chromosome. Various examples of polymorphisms in linkage disequilibrium with the polymorphisms of the present disclosure are shown in tables 1 to 4. The inventors found that many polymorphisms in linkage disequilibrium with the polymorphisms of the present disclosure have LOD scores of about 2 to 50. Thus, in one embodiment, the LOD values of adjacent genotypes/loci according to the present disclosure are selected to be at least more than 2, at least more than 3, at least more than 4, at least more than 5, at least more than 6, at least more than 7, at least more than 8, at least more than 9, at least more than 10, at least more than 20, at least more than 30, at least more than 40, at least more than 50.

In another embodiment, a polymorphism in linkage disequilibrium with a polymorphism of the present disclosure may have a specific gene recombination distance of less than or equal to about 20 centimorgans (cM) or less. For example, 15cM or less, 10cM or less, 9cM or less, 8cM or less, 7cM or less, 6cM or less, 5cM or less, 4cM or less, 3cM or less, 2cM or less, 1cM or less, 0.75cM or less, 0.5cM or less, 0.25cM or less, 0.1cM or less. For example, two linked loci within a single chromosome segment can recombine with each other at a frequency of less than or equal to about 20%, about 19%, about 18%, about 17%, about 16%, about 15%, about 14%, about 13%, about 12%, about 11%, about 10%, about 9%, about 8%, about 7%, about 6%, about 5%, about 4%, about 3%, about 2%, about 1%, about 0.75%, about 0.5%, about 0.25%, or about 0.1% or less during meiosis.

In another embodiment, polymorphisms in linkage disequilibrium with polymorphisms of the present disclosure are within at least 100kb (which correlates with about 0.1cM in humans, depending on local recombination rates), at least 50kb, at least 20kb, or less of each other.

For example, one approach for identifying surrogate markers for a particular polymorphism involves a simple strategy that assumes that the polymorphisms surrounding the target polymorphism are in linkage disequilibrium and thus can provide information about disease susceptibility. Thus, as described herein, surrogate markers can therefore be identified from public databases such as HAPMAP by searching for polymorphisms suitable for selection of surrogate marker candidates that meet certain criteria found in the scientific community (see, e.g., legends to tables 1-4).

"allele frequency" refers to the frequency (proportion or percentage) of loci at which an allele is present within an individual, strain, or population of strains. For example, for allele "a," diploid individuals of genotype "AA", or "AA" have an allele frequency of 1.0, 0.5, or 0.0, respectively. Allele frequencies in a line or population (e.g., case or control) can be estimated by averaging the allele frequencies of individual samples from that line or population. Similarly, allele frequencies in a population of lines can be calculated by averaging the allele frequencies of the lines that make up the population.

In one embodiment, the term "allele frequency" is used to define a Minor Allele Frequency (MAF). MAF refers to the frequency of occurrence of the least common allele in a given population.

An individual is "homozygous" if the individual has only one type of allele at a given locus (e.g., a diploid individual has copies of the same allele at the locus of each of two homologous chromosomes). An individual is "heterozygous" if more than one allele type (e.g., a diploid individual having one copy of two different alleles) is present at a given locus. The term "homogeneity" means that the members of the group have the same genotype at one or more specific loci. In contrast, the term "heterogeneity" is used to indicate that individuals within a group differ in genotype at one or more specific loci.

A "locus" is a chromosomal location or region. For example, a polymorphic locus is a location or region at which a polymorphic nucleic acid, trait determinant, gene, or marker is located. In another example, a "locus" is a particular chromosomal locus (region) in the genome of a species where a particular gene can be found.

"marker", "molecular marker" or "marker nucleic acid" refers to a nucleotide sequence or its encoded product (e.g., a protein) that is used as a point of reference when identifying a locus or linked loci. Markers can be derived from genomic nucleotide sequences or from expressed nucleotide sequences (e.g., from RNA, nRNA, mRNA, cDNA, etc.) or encoded polypeptides. The term also refers to nucleic acid sequences that are complementary to or flanking the marker sequences, e.g., nucleic acids that serve as a probe or primer pair capable of amplifying the marker sequences. A "marker probe" is a nucleic acid sequence or molecule that can be used to identify the presence of a marker locus, e.g., a nucleic acid probe that is complementary to a sequence of a marker locus. Nucleic acids are "complementary" when they specifically hybridize in solution, for example, according to the Watson-Crick base pairing rules. A "marker locus" is a locus that can be used to track the presence of a second linked locus, e.g., a linked or related locus that encodes or contributes to a phenotypic trait population change. For example, a marker locus can be used to monitor segregation of alleles at loci (e.g., QTLs) that are genetically or physically linked to the marker locus. Thus, a "marker allele" or "allele of a marker locus" is one of a plurality of polymorphic nucleotide sequences found at a marker locus of a polymorphic population of marker loci. It is expected that each marker identified will be in close physical and genetic proximity (resulting in physical and/or genetic linkage) to a genetic element (e.g., a QTL) contributing to the associated phenotype. Markers corresponding to genetic polymorphisms between members of a population can be detected by art-recognized methods. These include, for example, DNA sequencing, PCR-based sequence-specific amplification methods, Restriction Fragment Length Polymorphism (RFLP) detection, isozyme marker detection, allele-specific hybridization detection (ASH), single nucleotide amplification detection, amplified variable sequence detection of the genome, detection of self-sustained sequence replication, simple sequence repeat detection (SSR), Single Nucleotide Polymorphism (SNP) detection, Amplified Fragment Length Polymorphism (AFLP) detection.

In the context of nucleic acid amplification, the term "amplification" is any method that produces additional copies of a selected nucleic acid (or a transcribed form thereof). Typical amplification methods include various polymerase-based replication methods, including Polymerase Chain Reaction (PCR), ligase-mediated methods such as Ligase Chain Reaction (LCR), and RNA polymerase-based amplification (e.g., by transcription) methods.

An "amplicon" is an amplified nucleic acid, e.g., a nucleic acid produced by amplifying a template nucleic acid by any available amplification method (e.g., PCR, LCR, transcription, etc.).

A "gene" is one or more nucleotide sequences in a genome that together encode one or more expression molecules (e.g., RNA or polypeptides). The gene may include coding sequences that are transcribed into RNA, which may then be translated into polypeptide sequences, and may include associated structural or regulatory sequences that facilitate gene replication or expression.

A "genotype" is the genetic makeup of an individual (or group of individuals) at one or more genetic loci. A genotype is defined by the alleles of one or more known loci of an individual, usually a compilation of alleles inherited from their parents.

A "haplotype" is the genotype of an individual at multiple genetic loci on a single DNA strand. Typically, the genetic loci described by the haplotypes are physically and genetically linked, i.e., on the same chromosomal strand.

A "set" of markers, probes or primers refers to a collection or set of marker probes, primers or data derived therefrom that are used for a common purpose, such as identifying individuals with a specified genotype (e.g., risk of developing breast cancer). Typically, data corresponding to the markers, probes or primers, or resulting from their use, is stored in an electronic medium. While each member of the group has utility for a given purpose, individual markers (including some, but not all markers) selected from the group and subgroups are also effective in achieving a particular purpose.

The polymorphisms and genes described above, as well as the corresponding marker probes, amplicons, or primers, can be embodied in any of the systems herein in the form of physical nucleic acids or in the form of system instructions that include sequence information for the nucleic acids. For example, the system can include primers or amplicons corresponding to (or amplifying a portion of) a gene or polymorphism described herein. As in the above methods, the set of marker probes or primers optionally detects a plurality of polymorphisms in a plurality of said genes or genetic loci. Thus, for example, the set of marker probes or primers detects at least one polymorphism in each of these polymorphisms or genes, or any other polymorphism, gene, or locus defined herein. Any such probe or primer may include the nucleotide sequence of any such polymorphism or gene or its complementary nucleic acid or its transcription product (e.g., an nRNA or mRNA form produced by the genomic sequence, e.g., by transcription or splicing).

As used herein, "risk assessment" refers to a method by which a subject's risk of developing breast cancer can be assessed. Risk assessment generally involves obtaining information relating to the subject's risk of developing breast cancer, assessing that information, and quantifying the subject's risk of developing breast cancer, e.g., by generating a risk score.

As used herein, "receiver operating characteristic curve" (ROC) refers to a plot of sensitivity versus (1-specificity) for a binary classifier system as a function of its discrimination threshold. ROC can also be expressed equivalently by plotting a true positive score (TPR ═ true positive rate) against a false positive score (FPR ═ false positive rate). Also referred to as a relative operating characteristic curve, since it is a comparison of two operating characteristics (TPR & FPR) as a standard variation. ROC analysis provides a tool to select the best possible model and discard sub-optimal models independent of (and prior to assignment to) cost context or class distribution. The methods used in the context of the present disclosure will be clear to those skilled in the art.

The phrase "combining a first clinical risk assessment, a second clinical risk assessment, and a genetic risk assessment" as used herein refers to any suitable mathematical analysis that relies on the results of the assessments. For example, the results of the first clinical risk assessment, the second clinical risk assessment and the genetic risk assessment may be added, more preferably multiplied.

As used herein, the terms "conventional screening for breast cancer" and "more frequent screening" are relative terms and are based on a comparison of recommended screening levels to subjects who have not identified a risk of developing breast cancer.

Clinical risk assessment

In one embodiment, the first and/or second clinical risk assessment procedures comprise obtaining clinical information of a female subject. In other embodiments, these details have been determined (such as in a medical record of the subject).

In one embodiment, the first clinical risk assessment program comprises obtaining information from a female regarding one or more of: history of breast cancer, ductal or lobular cancer, age of the menstrual history, such as the first menstrual period, age of her first delivery, family history of breast cancer or other cancer (including age of relatives at diagnosis), previous breast biopsy results, use of oral contraceptives, body mass index, drinking history, smoking history, exercise history, diet and race/ethnic group. Examples of clinical risk assessment programs include, but are not limited to: gail models (Gail et al, 1989,1999 and 2007; Costanino et al, 1999; Rockhill et al, 2001), Claus models (Claus et al, 1994 and 1998), Claus tables, BOADICEA (Antoniou et al, 2002 and 2004), Jonker models (Jonker et al, 2003), Claus Extended Formula (van Asperen et al, 2004), Tyrer-Cuzick models (Tyrer et al, 2004) and the Manchester scoring system (Evans et al, 2004), and the like.

In one embodiment, the first clinical risk assessment is obtained using a Gail model. Such a procedure may be used to estimate a 5-year risk or lifetime risk of a human female subject. The Gail model is a statistical model that forms the basis of a tool for assessing breast cancer risk, named under the name of doctor Mitchell Gail, an advanced researcher in the branch of NCI cancer epidemiology and genetics biometry. The model uses a woman's own personal history (number of previous breast biopsies and presence of atypical hyperplasia in any previous breast biopsy specimen), her own fertility history (age at onset of menstruation and age at safe birth of the first child) and history of breast cancer in the first degree relatives (mother, sister, daughter) to estimate her risk for invasive breast cancer within a specific period. The model was developed using data from the Breast Cancer Detection Demonstration Program (BCDDP), which is a breast cancer screening study conducted by NCI in conjunction with the american cancer society involving 280,000 women aged 35 to 74 years, and the NCI monitoring, epidemiology and end result (SEER) program. Estimates for african american women are based on women's contraceptive and reproductive experience (CARE) study data as well as SEER data. CARE participants included 1607 women with invasive breast cancer and 1637 women without invasive breast cancer.

The Gail model has been tested in a large number of white women and has been shown to provide an accurate estimate of the risk of breast cancer. In other words, the model has been "validated" for white women. It was also tested in data from female health initiatives on african-american women, and this model works well but may underestimate the risk of african-american women who have previously received a biopsy. The model was also validated against hispanic women, asian american women, and american native citizens.

In another embodiment, the first clinical risk assessment is obtained using a Tyrer-Cuzick model. The Tyrer-Cuzick model combines both genetic and non-genetic factors (Tyrer et al, 2004). Nonetheless, the Tyrer-Cuzick model is considered separate from the genetic risk assessment outlined in the present disclosure. Tyrer-Cuzick uses a three generation lineage to estimate the likelihood that an individual carries a BRCA1/BRCA2 mutation or a putative low-penetrance gene. In addition, the model incorporates personal risk factors such as childbirth, body mass index, height and first tide, menopause, HRT use and age at first safe production.

In another embodiment, the first clinical risk assessment is obtained using a BOADICEA model. The BOADICEA model was designed using an isolation analysis, where sensitivity is explained by mutations in BRCA1 and BRCA2 and multiple gene components that reflect multiplicative effects of multiple genes, each with a small impact on breast cancer risk (Antoniou et al, 2002 and 2004). The algorithm can predict the likelihood of mutations at BRCA1/BRCA2 in individuals with a family history of breast cancer, and can estimate the risk of cancer.

In another embodiment, the first clinical risk assessment program is obtained using a BRCAPRO model. The BRCAPRO model is a Bayesian model that contains published BRCA1 and BRCA2 mutation frequencies. Cancer penetrance of mutation carriers, cancer status (affected, unaffected, unknown) and age of the patient's primary and secondary relatives (Parmigiani et al, 1998). The algorithm can predict the likelihood of mutations at BRCA1/BRCA2 in individuals with a family history of breast cancer, and can estimate the risk of cancer.

In another embodiment, the first clinical risk assessment is obtained using a Claus model. The Claus model provides an assessment of genetic risk for breast cancer. The model was developed using data from cancer and steroid hormone studies. This model initially included only data on family history of breast cancer (Claus et al, 1991), but was later updated to include data on family history of ovarian cancer (Claus et al, 1993). In practice, lifetime risk estimates are usually derived from the so-called Claus table (Claus et al, 1994). This model was further modified to incorporate information about bilateral disease, cancer and three or more affected relatives and is referred to as the "Claus extension model" (van Asperen et al, 2004).

In one embodiment, the first clinical risk assessment does not take into account breast density.

In one embodiment, the first clinical risk assessment takes into account at least the age of the woman. In another embodiment, the first clinical risk assessment is based solely on the age of the female subject and the family history of breast cancer. In this embodiment, the first clinical risk assessment may also optionally consider ethnic group. Thus, in another embodiment, the first clinical risk assessment is based solely on the family history and population of breast cancer in the female subject. In another embodiment, the first clinical risk assessment is based solely on the age and ethnic group of the female subject. In another embodiment, the first clinical risk assessment is based solely on the age, family history and ethnic group of the female subject.

In one embodiment, the family history of breast cancer in the female subject is based only on the first degree relatives of the female subject.

In another embodiment, the family history of breast cancer in the female subject is based on the primary and secondary relatives of the female subject.

"family history of breast cancer" is used in the context of the present disclosure to refer to a history of breast cancer of a primary and/or secondary relative of a female subject. For example, "family history of breast cancer" can be used to refer to a history of breast cancer of only the first degree relatives. In other words, the first clinical risk assessment program may take into account the family history of breast cancer of the first degree relatives of the female subject. In the context of the present disclosure, a "first degree relative" is a family member that shares about 50% of genes with a female subject. Examples of first degree relatives include parents, children and siblings with the parent and mother. A "secondary relative" is a family member that shares about 25% of genes with female subjects. Examples of secondary relatives include uncle/father, aunt/girl, nephew/ext 29989, girl, grandfather/grandfather, grandson/grandson (girl), and siblings of the same father/same mother and father.

In one embodiment, the first clinical risk assessment takes into account at least age, number of previous breast biopsies, and known medical history in the first degree relatives. In one embodiment, the first clinical risk assessment takes into account at least age, number of previous breast biopsies, and known medical history in primary and secondary relatives. In one embodiment, the first clinical risk assessment does not take into account third or further relatives.

In one embodiment, the first clinical risk assessment is based only on the age of the female subject and the known medical history of breast cancer in the first degree relatives. In another embodiment, the first clinical risk assessment is based on the age, first degree relatives, and known history of breast cancer in the population of female subjects.

As used herein, "based on" means assigning a value to, for example, the age and family history of breast cancer of a subject, but then performing any suitable calculation to determine clinical risk.

Female subjects may self-report clinical information. For example, the subject may complete a questionnaire aimed at obtaining clinical information such as age, first degree relatives breast cancer history and ethnic group. In another embodiment, the clinical information may be obtained from the medical record by querying an associated database containing the clinical information, subject to obtaining informed consent from the female subject.

In one embodiment, the first clinical risk assessment program provides an estimate of the risk of a human female subject to suffer from breast cancer during the next 5 years (i.e., a 5 year risk).

In another embodiment, the first clinical risk assessment program provides an estimate of the risk of a human female subject to suffer from breast cancer before the age of 90 years (i.e., lifetime risk).

In another embodiment, the first clinical risk assessment is performed using a model that calculates an absolute risk of developing breast cancer. For example, the absolute risk of developing breast cancer can be calculated using the incidence of cancer while considering the competing risk of death from other causes besides breast cancer.

In one embodiment, the first clinical risk assessment provides an absolute risk of developing breast cancer for 5 years. In another embodiment, the first clinical risk assessment provides an absolute risk of developing breast cancer for 10 years.

The second clinical risk assessment is based on at least breast density. In one embodiment, the second clinical risk assessment is based solely on breast density.

Breast density can be measured using any method known in the art. For example, breast density can be estimated based on the radiographic appearance of the breast on a mammogram. As will be appreciated by those skilled in the art, dense breast tissue appears brighter on mammography and includes epithelial and stromal tissue, while non-dense tissue including fat appears darker. Thus, in some embodiments, breast density is assessed using mammography.

In one embodiment, a higher pixel intensity threshold is used to assess breast density.

In one embodiment, percent dense area (percent dense area) is used to assess breast density. The percentage of dense regions can be calculated by dividing the area of dense breast tissue by the total breast area determined in the breast image (e.g., mammogram).

In one embodiment, breast density is assessed using the percent dense area Cumulus. In another embodiment, breast density is assessed using Cumulus percentages of dense and non-dense regions. "Cumulus" is a software package for semi-automated measurement of dense areas from mammograms and is described in (Byng et al, 1994).

In one embodiment, breast density is assessed using the BI-RADS score. "BI-RADS" is an abbreviation for breast imaging reporting and data system, a standardized numerical code system commonly assigned by radiologists after interpretation of mammographies, and is used to communicate a subject's risk of breast cancer. BI-RADS scores can also be obtained using automated computer methods. Typical BI-RADS assessment categories (BI-RADS maps) are:

0: incomplete evaluation;

1: negative;

2: benign;

3: may be benign;

4: suspected;

5: highly suspected malignancy; and

6: biopsy-confirmed malignancy is known.

Genetic risk assessment

In one embodiment, the genetic risk assessment is performed by analyzing the genotype of the subject for polymorphisms associated with breast cancer at 2 or more loci. Various exemplary polymorphisms associated with breast cancer are discussed in this disclosure. These polymorphisms differ in penetrance and those skilled in the art will appreciate that many polymorphisms are low penetrance.

The term "penetrance" is used in the context of the present disclosure to refer to the frequency with which a particular polymorphism is manifested in a female subject suffering from breast cancer. A "high penetrance" polymorphism is almost always evident in female subjects with breast cancer, whereas a "low penetrance" polymorphism is only sometimes evident. In one embodiment, the polymorphism assessed as part of a genetic risk assessment according to the present disclosure is a low-exon polymorphism.

As the skilled person will appreciate, each polymorphism that increases the risk of developing breast cancer has an odds ratio associated with breast cancer of greater than 1.0. In one embodiment, the odds ratio is greater than 1.02. Each polymorphism that reduces the risk of developing breast cancer has an odds ratio associated with breast cancer of less than 1.0. In one embodiment, the yield ratio is less than 0.98. Examples of such polymorphisms include, but are not limited to, those provided in tables 6-14 or polymorphisms in linkage disequilibrium with one or more thereof. In one embodiment, the genetic risk assessment involves assessing polymorphisms associated with increased risk of developing breast cancer. In another embodiment, the genetic risk assessment involves assessing a polymorphism associated with a reduced risk of developing breast cancer. In another embodiment, the genetic risk assessment involves assessing polymorphisms associated with increased risk of developing breast cancer and polymorphisms associated with decreased risk of developing breast cancer.

In one embodiment, the genetic risk assessment is performed by analyzing the genotype of the subject for polymorphisms associated with breast cancer at 2,3, 4, 5,6, 7, 8, 9, 10 or more loci. Exemplary polymorphisms associated with breast cancer risk assessment include rs2981582, rs3803662, rs889312, rs13387042, rs13281615, rs4415084, rs3817198, rs4973768, rs6504950 and rs11249433, or polymorphisms in linkage disequilibrium with one or more thereof.

In another embodiment, the genetic risk assessment is performed by analyzing the genotype of the subject at 20, 30, 40, 50, 60, 70, 80, 100, 120, 140, 160, 180, 200 or more loci for polymorphisms associated with breast cancer.

In one embodiment, the genetic risk assessment is performed by analyzing the genotype of the subject for polymorphisms associated with breast cancer at 72 or more loci. In one embodiment, the genetic risk assessment is performed by analyzing the genotype of the subject for polymorphisms associated with breast cancer at 150 or more loci. In one embodiment, the genetic risk assessment is performed by analyzing the genotype of the subject for polymorphisms associated with breast cancer at 200 or more loci.

In one embodiment, when performing the methods of the present disclosure to assess the risk of breast cancer, at least 67 polymorphisms are selected from table 7 or polymorphisms in linkage disequilibrium with one or more thereof, and the remaining polymorphisms are selected from table 6 or polymorphisms in linkage disequilibrium with one or more thereof. In another embodiment, when performing the methods of the present disclosure, at least 68, at least 69, at least 70 polymorphisms are selected from the polymorphisms of table 7 or in linkage disequilibrium with one or more thereof, and the remaining polymorphisms are selected from the polymorphisms of table 6 or in linkage disequilibrium with one or more thereof. In one embodiment, at least 72, at least 73, at least 74, at least 75, at least 76, at least 77, at least 78, at least 79, at least 80, at least 81, at least 82, at least 83, at least 84, at least 85, at least 86, at least 87, at least 88 polymorphisms shown in table 6, or polymorphisms in linkage disequilibrium with one or more thereof, are assessed. In other embodiments, at least 67, at least 68, at least 69, at least 70, or polymorphisms in linkage disequilibrium with one or more of those shown in table 7 are assessed. In other embodiments, at least 70, at least 71, at least 72, at least 73, at least 74, at least 75, at least 76, at least 77, at least 78, at least 79, at least 80, at least 81, at least 82, at least 83, at least 84, at least 85, at least 86, at least 87, at least 88 polymorphisms are evaluated for at least 67, at least 68, at least 69, at least 70 polymorphisms shown in table 7, or polymorphisms in linkage disequilibrium with one or more thereof, and any remaining polymorphisms are selected from table 6, or polymorphisms in linkage disequilibrium with one or more thereof.

In some embodiments, when performing the methods of the present disclosure to assess the risk of breast cancer, the one or more polymorphisms are selected from table 12 or polymorphisms in linkage disequilibrium with one or more thereof. In one embodiment, at least 50 polymorphisms are selected from polymorphisms from table 12 or in linkage disequilibrium with one or more thereof. In one embodiment, at least 60, at least 70, at least 80, at least 90, at least 100, at least 110, at least 120, at least 130, at least 140, at least 150, at least 160, at least 170, at least 180, at least 190, at least 200 polymorphisms are selected from polymorphisms in table 12 or in linkage disequilibrium with one or more thereof. In one embodiment, at least 100 polymorphisms are selected from polymorphisms from table 12 or in linkage disequilibrium with one or more thereof. In one embodiment, at least 150 polymorphisms are selected from polymorphisms from table 12 or in linkage disequilibrium with one or more thereof. In one embodiment, at least 200 polymorphisms are selected from polymorphisms from table 12 or in linkage disequilibrium with one or more thereof.

In one embodiment, when determining the risk of breast cancer, the methods of the present disclosure comprise detecting at least 50, at least 100, or at least 150 polymorphisms shown in table 12, or polymorphisms in linkage disequilibrium with one or more thereof. In one embodiment, when determining the risk of breast cancer, the method of the present disclosure comprises detecting all 203 polymorphisms shown in table 12, or polymorphisms in linkage disequilibrium with one or more thereof.

In one embodiment, when determining the risk of breast cancer, the methods of the present disclosure comprise detecting at least 50, 80, 100, 150 polymorphisms shown in table 12, or polymorphisms in linkage disequilibrium with one or more thereof.

Those skilled in the art will readily identify polymorphisms in linkage disequilibrium with those specifically mentioned herein. Examples of such polymorphisms include rs1219648 and rs2420946 in strong linkage disequilibrium with rs2981582 (other possible examples provided in table 1), rs12443621 and rs8051542 in strong linkage disequilibrium with polymorphism rs3803662 (other possible examples provided in table 2), and rs10941679 in strong linkage disequilibrium with polymorphism rs4415084 (other possible examples provided in table 3). In addition, table 4 provides examples of polymorphisms in linkage disequilibrium with rs 13387042. Such linked polymorphisms of the other polymorphisms listed in table 6 or table 12 can be very easily identified by the skilled person using the HAPMAP database.

TABLE 1 surrogate markers for polymorphism rs 2981582. Markers were selected for rs2981582 with r2 greater than 0.05 in the HAPMAP dataset (http:// HAPMAP. ncbi. nlm. nih. gov) within the 1Mbp interval flanking the marker. The name of the relevant polymorphism, the r2 and D' values for rs2981582 and the corresponding LOD values are shown, as well as the position of the surrogate marker in NCB Build 36.

Figure BDA0002539373160000211

Figure BDA0002539373160000221

Table 2. surrogate markers for polymorphism rs 3803662. Markers were selected for rs3803662 with r2 greater than 0.05 in the HAPMAP dataset (http:// HAPMAP. ncbi. nlm. nih. gov) within the 1Mbp interval flanking the marker. The name of the relevant polymorphism, r2 and D' values for rs3803662 and the corresponding LOD values are shown, as well as the position of the surrogate marker in NCB Build 36.

TABLE 3 surrogate markers for polymorphism rs 4415084. Markers were selected for rs4415084 with r2 greater than 0.05 in the HAPMAP dataset (http:// HAPMAP. ncbi. nlm. nih. gov) within the 1Mbp interval flanking the marker. The name of the relevant polymorphism, r2 and D' values for rs4415084 and the corresponding LOD values are shown, as well as the position of the surrogate marker in NCB Build 36.

Figure BDA0002539373160000232

Figure BDA0002539373160000241

Figure BDA0002539373160000251

TABLE 4 surrogate markers for polymorphism rs 13387042. Markers were selected for rs13387042 with r2 greater than 0.05 in the HAPMAP dataset (http:// HAPMAP. ncbi. nlm. nih. gov) within the 1Mbp interval flanking the marker. The name of the relevant polymorphism, r2 and D' values for rs13387042 and the corresponding LOD values are shown, as well as the position of the surrogate marker in NCB Build 36.

Figure BDA0002539373160000252

In another embodiment, when determining the risk of breast cancer, the methods of the present disclosure encompass assessing all of the polymorphisms shown in table 6 or table 12, or polymorphisms in linkage disequilibrium with one or more thereof.

Overlapping polymorphisms are listed in tables 6,7, and 12. It will be appreciated that when selecting a polymorphism for evaluation, the same polymorphism will not be selected twice. For convenience, the polymorphisms in Table 6 have been divided into tables 7 and 8. Table 7 lists the polymorphisms shared by caucasians, african-americans, and hispanic populations. Table 8 lists polymorphisms not shared by Caucasian, African-American, and Hispanic populations.

In another embodiment, 72 to 88, 73 to 87, 74 to 86, 75 to 85, 76 to 84, 75 to 83, 76 to 82, 77 to 81, 78 to 80 polymorphisms are assessed, wherein at least 60, at least 61, at least 62, at least 63, at least 64, at least 65, at least 66, at least 67, at least 68, at least 69, at least 70 polymorphisms shown in table 7, or polymorphisms in linkage disequilibrium with one or more thereof, are assessed, and any remaining polymorphisms are selected from table 6, or polymorphisms in linkage disequilibrium with one or more thereof.

In one embodiment, the number of polymorphisms evaluated is based on the net reclassification improvement of risk prediction calculated using the Net Reclassification Index (NRI) (Pencina et al, 2008).

In one embodiment, the net reclassification improvement of the methods of the present disclosure is greater than 0.01.

In another embodiment, the net reclassification improvement of the methods of the present disclosure is greater than 0.05.

In yet another embodiment, the net reclassification improvement of the methods of the present disclosure is greater than 0.1.

In another embodiment, the genetic risk assessment is performed by analyzing the genotype of the subject for polymorphisms associated with breast cancer at 90 or more loci. In another embodiment, the genetic risk assessment is performed by analyzing the genotype of the subject for polymorphisms associated with breast cancer at 100, 200, 300, 400, 500, 600, 700, 800, 900, 1,000, 5,000, 10,000, 50,000, 100,000 or more loci. In these embodiments, the one or more polymorphisms may be selected from tables 6 through 12.

Genotypic changes in the population

It is known to those skilled in the art that there are genotypic variations between different populations. This phenomenon is called human genetic change. Human genetic changes are often observed between populations of different ethnic group backgrounds. Such changes are rarely consistent and are often dictated by various combinations of environmental and lifestyle factors. Due to genetic variation, it is often difficult to identify populations of genetic markers (such as polymorphisms) that remain informative between different populations (such as populations from different ethnic group backgrounds).

Disclosed herein are selections of polymorphisms common to at least three ethnic group backgrounds that remain informative for assessing risk of developing breast cancer.

In one embodiment, the methods of the present disclosure can be used to assess the risk of a human female subject from a variety of ethnic group backgrounds to suffer from breast cancer. For example, the female subject may be divided into caucasian, australian, mongolian and black race persons according to physical anthropology.

In one embodiment, the human female subject may be caucasian, african american, hispanic, asian, indian or hispanic. In a preferred embodiment, the human female subject is caucasian, african-american or hispanic. Thus, the population can be considered as part of a clinical and/or genetic risk assessment.

In one embodiment, the human female subject is caucasian and is assessed for at least 72, at least 73, at least 74, at least 75, at least 76, at least 77 polymorphisms selected from table 9, or polymorphisms in linkage disequilibrium with one or more thereof. Alternatively, all 77 polymorphisms selected from table 9, or polymorphisms in linkage disequilibrium with one or more thereof, are assessed.

In another embodiment, the human female subject is of black ethnic or african american and is assessed for at least 70, at least 71, at least 72, at least 73, or at least 74 polymorphisms selected from table 10, or polymorphisms in linkage disequilibrium with one or more thereof. Alternatively, at least 74 polymorphisms selected from table 10, or polymorphisms in linkage disequilibrium with one or more thereof, are assessed.

In another embodiment, the human female subject is of black ethnic or african american and is assessed for at least 70, at least 71, at least 72, at least 73, or at least 74 polymorphisms shown in table 13, or polymorphisms in linkage disequilibrium with one or more thereof. In one embodiment, the human female subject is of black race or african americans, and the methods described herein comprise detecting all 74 polymorphisms shown in table 13, or polymorphisms in linkage disequilibrium with one or more thereof.

In another embodiment, the human female subject is of hispanic and is assessed for at least 67, at least 68, at least 69, at least 70, or at least 71 polymorphisms selected from table 11, or polymorphisms in linkage disequilibrium with one or more thereof. Alternatively, at least 71 polymorphisms selected from table 11, or polymorphisms in linkage disequilibrium with one or more thereof, are assessed.

In another embodiment, the human female subject may be of hispanic and is assessed for at least 67, at least 68, at least 69, at least 70 or at least 71 polymorphisms shown in table 14, or polymorphisms in linkage disequilibrium with one or more thereof. In one embodiment, the human female subject may be of hispanic and the methods described herein comprise detecting all 71 polymorphisms shown in table 14, or polymorphisms in linkage disequilibrium with one or more thereof.

It is well known that over time, there is a mixture of blood from different ethnic groups. However, in practice this does not affect the ability of the skilled person to practice the invention.

In the context of the present disclosure, a white-skin female subject, mainly of european origin, either directly or indirectly through descent, is considered to be caucasian. The caucasian may have, for example, at least 75% of caucasian descent (e.g., without limitation, a female subject having at least three caucasian grandparents).

In the context of the present disclosure, female subjects that are predominantly derived from central or south africa, either directly or indirectly through ancestry, are considered to be black. For example, a black population may have at least 75% of black population descent. In the context of the present disclosure, american female subjects with predominantly black blood and black skin are considered to be african americans. For example, african americans may have at least 75% of black descent. Similar principles apply to e.g. females of black descent living in other countries, such as the uk, canada and the netherlands.

In the context of the present disclosure, a female subject that is predominantly derived, directly or indirectly, by descent from hispanic or a spanish-language country (e.g., the central or south united states) is considered to be of hispanic. For example, hispanic may have a hispanic ancestry of at least 75%.

The terms "ethnic group" and "ethnicity" are used interchangeably in the context of this disclosure. In one embodiment, genetic risk assessment can be readily practiced depending on which ethnic group the subject believes oneself belongs to. Thus, in one embodiment, the population of human female subjects is self-reported by the subjects. For example, female subjects may be asked to answer this question to determine their ethnic group: "what group do you belong to? ". In another example, the population of female subjects is from medical records after appropriate informed consent has been obtained from the subjects or from the opinion or observation of a clinician.

Calculating the relative Risk of Complex polymorphisms "polymorphism Risk"

A composite polymorphism relative risk score ("polymorphism risk") for an individual may be defined as the product of the genotype relative risk values for each polymorphism evaluated. The log-additive and risk models can then be used to define three genotypes AA, AB and BB with a single biallelic polymorphism with relative risk values of 1, OR and OR2 in the case of the rare disease model, where OR is the previously reported disease odds ratio for the high risk allele B relative to the low risk allele a. If allele B frequency is (p), then the population frequency for these genotypes is (1-p)2, 2p (1-p) and p2, assuming Hardy-Weinberg equilibrium then the genotype relative risk values for each polymorphism can be scaled so that the average relative risk in the population is 1 according to these frequencies. Specifically, assume that the population average relative risk is not scaled:

(μ)=(1-p)2+2p(1-p)OR+p2OR2

adjusted risk values of 1/μ, OR/μ and OR2/μ for AA, AB and BB genotypes. The lost genotype is assigned a relative risk of 1.

Combined first clinical risk × second clinical risk × genetic risk

It is envisaged that the "risk" of a human female subject for breast cancer may be provided as a relative risk (or risk ratio) or an absolute risk, as required. In one embodiment, the first clinical risk assessment, the second clinical risk assessment and the genetic risk assessment are combined to obtain an "absolute risk" of a human female subject for breast cancer. An absolute risk is a numerical probability that a human female subject will develop breast cancer within a particular time period (e.g., 5 years, 10 years, 15 years, 20 years, or more). It reflects the risk of a human female subject to suffer from breast cancer, as it does not take into account the various risk factors in isolation.

In one embodiment, the absolute risk is determined by using any one or more of the following values:

cumulative incidence of breast cancer from birth to baseline age;

cumulative incidence of breast cancer from birth to baseline age plus 5 (or 10) years;

cumulative incidence of breast cancer from birth to 85 years of age;

survival from baseline age to baseline age plus 5 or 10 years; and

survival from baseline age to 85 years of age.

Data on breast cancer incidence and competitive mortality can be obtained from a variety of sources. For example, such data may be obtained from the U.S. surveillance, epidemiology and end result planning (SEER) database.

In one embodiment, the population-specific breast cancer incidence and competitive mortality data are used in the above formula. In one example, population-specific breast cancer incidence and competitive mortality data can also be obtained from the SEER database.

Various suitable databases can be used to calculate the relative risk associated with a family history of breast cancer in a female subject. A Cancer, colorful Group on Hot faces in Breast Cancer (CGoHFiB) provides an example. In another embodiment, relevant demographics may be obtained from the Seer database (Siegel et al, 2016).

In another embodiment, the first clinical risk assessment, the second clinical risk assessment and the genetic risk assessment are combined to obtain a "relative risk" of a human female subject for breast cancer. The relative risk (or risk ratio) is measured as the disease incidence of an individual with a particular characteristic (or exposure) divided by the disease incidence of an individual without that characteristic, indicating whether the particular exposure is increasing or decreasing risk. Relative risk helps identify disease-related features, but is not itself particularly helpful in guiding screening decisions because the frequency of risk (incidence) is offset.

Treatment of

After performing the methods of the present disclosure, a treatment can be administered to or at the subject.

Thus, in one embodiment, the methods of the present disclosure relate to an anti-cancer therapy for preventing or reducing the risk of breast cancer in a human subject at risk for breast cancer.

One skilled in the art will appreciate that breast cancer is a heterogeneous disease with varying clinical outcomes (Sorlie et al, 2001). For example, it has been discussed in the art that breast cancer may be estrogen receptor positive or estrogen receptor negative. In one embodiment, it is not contemplated that the methods of the present disclosure are limited to assessing the risk of developing a particular type or subtype of breast cancer. For example, it is contemplated that the methods of the present disclosure can be used to assess the risk of developing estrogen receptor positive or estrogen receptor negative breast cancer. In another embodiment, the methods of the present disclosure are used to assess the risk of developing estrogen receptor positive breast cancer. In another embodiment, the methods of the present disclosure are used to assess the risk of developing estrogen receptor negative breast cancer. In another embodiment, the methods of the present disclosure are used to assess the risk of developing metastatic breast cancer. In one embodiment, a therapy that inhibits an estrogen is administered to or at the subject.

In another embodiment, a chemopreventive agent is administered to or at the subject. There are currently two main classes of drugs used for breast cancer chemoprevention:

(1) a Selective Estrogen Receptor Modulator (SERM) that blocks the binding of estrogen molecules to their associated cellular receptors. Such drugs include, for example, Tamoxifen (Tamoxifen) and Raloxifene (Raloxifene).

(2) Aromatase inhibitors, which reduce estrogen production by inhibiting the conversion of androgens to estrogens by aromatase. Such drugs include, for example, Exemestane (Exemestane), Letrozole (Letrozole), Anastrozole (Anastrozole), Vorozole (Vorozole), vaseline (Formestane), Fadrozole (Fadrozole).

In one embodiment, a SERM or aromatase inhibitor is administered to the subject.

In one embodiment, tamoxifen, raloxifene, exemestane, letrozole, anastrozole, vorozole, petrolatum, or fadrozole is administered to the subject.

In one embodiment, the methods of the present disclosure are used to assess the risk of a human female subject for developing breast cancer and administer a treatment suitable for the risk of developing breast cancer. For example, when performing the methods of the present disclosure indicates a high risk of breast cancer, an invasive chemopreventive treatment regimen may be established. In contrast, when performing the methods of the present disclosure indicates an intermediate risk of breast cancer, a less invasive chemopreventive treatment regimen may be established. Alternatively, when performing the methods of the present disclosure indicates a low risk of breast cancer, there is no need to establish a chemopreventive treatment regimen. It is contemplated that the methods of the present disclosure may be performed over time such that treatment regimens may be improved depending on the subject's risk of developing breast cancer.

Marker detection strategy

Amplification primers for amplifying a marker (e.g., a marker locus) and suitable probes for detecting such a marker or genotyping a sample relative to multiple marker alleles can be used in the present disclosure. For example, primer selection for long-range PCR is described in US 10/042,406 and US 10/236,480; for short-range PCR, US 10/341,832 provides guidance on primer selection. In addition, there are published programs such as "Oligo" that can be used for primer design. With such available primer selection and design software, the obtained human genome sequence and polymorphic site can be published, and the skilled person can construct primers to amplify the polymorphism to practice the present disclosure. Furthermore, it is to be understood that the precise probes used to detect nucleic acids comprising a polymorphism (e.g., amplicons comprising a polymorphism) can vary, e.g., any probe that can identify the region of the marker amplicon to be detected can be used in conjunction with the present disclosure. Furthermore, the configuration of the detection probes may of course vary. Thus, the present disclosure is not limited to the sequences described herein.

Indeed, it will be appreciated that marker detection does not require amplification, and that unamplified genomic DNA may be detected directly, for example simply by Southern blotting of a sample of genomic DNA.

In general, molecular markers are detected by any established method available in the art, including but not limited to Allele Specific Hybridization (ASH), extension detection, array hybridization (optionally including ASH) or other methods for detecting polymorphisms, Amplified Fragment Length Polymorphism (AFLP) detection, amplified variable sequence detection, random amplified polymorphic dna (rapd) detection, Restriction Fragment Length Polymorphism (RFLP) detection, self-sustained sequence replication detection, Simple Sequence Repeat (SSR) detection, and Single Strand Conformation Polymorphism (SSCP) detection.

Examples of oligonucleotide primers that can be used to amplify nucleic acids comprising polymorphisms associated with breast cancer are provided in table 5. As will be appreciated by those skilled in the art, the sequence of the genomic region to which these oligonucleotides hybridize may be used to design primers that are longer at the 5 'and/or 3' ends, possibly shorter at the 5 'and/or 3' (as long as truncated forms are still available for amplification), have one or a few nucleotide differences (but may still be available for amplification), or have no sequence similarity to those provided, but are designed based on the genomic sequence to which the oligonucleotides hybridize provided near specificity, and may still be available for amplification.

TABLE 5 examples of oligonucleotide primers that can be used in the present disclosure.

Figure BDA0002539373160000341

In some embodiments, the primers of the present disclosure are radiolabeled or labeled by any suitable means (e.g., using a non-radioactive fluorescent label) to allow for rapid visualization of amplicons of different sizes after an amplification reaction without the need for any additional labeling or visualization steps. In some embodiments, the primers are not labeled and the amplicons are visualized after resolving their size (e.g., after agarose or acrylamide gel electrophoresis). In some embodiments, ethidium bromide staining of PCR amplicons after size discrimination allows visualization of amplicons of different sizes.

The primers of the present disclosure are not intended to be limited to producing amplicons of any particular size. For example, primers used to amplify the marker loci and alleles herein are not limited to amplifying the entire region of the relevant locus or any sub-region thereof. The primers can produce amplicons of any suitable length for detection. In some embodiments, marker amplification produces an amplicon that is at least 20 nucleotides in length, or at least 50 nucleotides in length, or at least 100 nucleotides in length, or at least 200 nucleotides in length. Amplicons of any size can be detected using the various techniques described herein. Differences in basic composition or size can be detected by conventional methods (e.g., electrophoresis).

Some techniques for detecting genetic markers utilize hybridization of probe nucleic acids to nucleic acids corresponding to the genetic markers (e.g., amplified nucleic acids generated using genomic DNA as a template). Hybridization formats include, but are not limited to: solution phase, solid phase, mixed phase or in situ hybridization assays can be used for allele detection. Detailed guidance for Nucleic Acid Hybridization is found in Tijssen (1993) Laboratory Techniques in Biochemistry and Molecular Biology- -Hybridization with Nucleic Acid Probes Elsevier, New York, and Sambrook et al (supra).

In accordance with the present disclosure, dual-labeled fluorescent oligonucleotide probes (commonly referred to as "TaqMan" may also be usedTM"probes") were subjected to PCR detection. These probes consist of short (e.g., 20-25 bases) oligodeoxynucleotides labeled with two different fluorescent dyes. At the 5 'end of each probe is a reporter dye and at the 3' end of each probe is found a quencher dye. The oligonucleotide probe sequence is complementary to an internal target sequence present in the PCR amplicon. When the probe is intact, energy transfer occurs between the two fluorophores, and the emission from the reporter is quenched by the quencher via FRET. During the extension phase of the PCR, the probe is cleaved by the 5' nuclease activity of the polymerase used in the reaction, thereby releasing the reporter from the oligonucleotide quencher and producing an increase in the emission intensity of the reporter. Thus, TaqManTMThe probe is an oligonucleotide having a label and a quencher, wherein the label is released during amplification by exonuclease action of a polymerase used for amplification. This provides a real-time measurement of amplification during synthesis. Various TaqMan enzymesTMReagents are commercially available, for example, from Applied Biosystems (headquarters in Foster, Calif.) and from various specialty manufacturers such as Biosearch Technologies (e.g., Black hole quencher probes). Further details on the strategy of double-labeled probes can be found, for example, in WO 92/02638.

Other similar methods include, for example, fluorescence resonance energy transfer between two adjacent hybridization probes, for example using the methods described in U.S. Pat. No. 6,174,670Form (a).

Array-based assays can be performed using commercially available arrays, such as those from Affymetrix (Santa Clara, Calif.) or other manufacturers. Reviews on the operation of nucleic acid arrays include Sapolsky et al (1999); lockhart (1998); fodor (1997 a); fodor (1997b) and Chee et al (1996). Array-based assays are a preferred method of identifying markers disclosed in a sample due to their inherently high throughput properties.

The nucleic acid sample to be analyzed is isolated, amplified, and typically labeled with biotin and/or a fluorescent reporter group. The labeled nucleic acid sample is then incubated with the array using a fluidic platform and hybridization oven. Depending on the detection method, the array may be washed and/or stained or counter-stained. After hybridization, washing and staining, the array is inserted into a scanner where the hybridization pattern is detected. Hybridization data is collected from the light emitted by the fluorescent reporter group that has incorporated the labeled nucleic acid, which is now bound to the probe array. The probe that most clearly matches the labeled nucleic acid produces a stronger signal than one with a mismatch. Since the sequence and position of each probe on the array is known, the identity (identity) of the nucleic acid sample applied to the probe array can be identified by complementarity.

Markers and polymorphisms can also be detected using DNA sequencing. DNA sequencing methods are well known in the art and can be found, for example, in: ausubel et al, eds., Short Protocols in Molecular Biology,3rd ed., Wiley, (1995) and Sambrook et al, Molecular Cloning,2nd ed., Chap.13, Cold spring Harbor Laboratory Press, (1989). Sequencing may be performed by any suitable method, for example, dideoxy sequencing, chemical sequencing or variants thereof.

Suitable sequencing methods also include second, third or fourth generation sequencing technologies, all referred to herein as "next generation sequencing," which include, but are not limited to, pyrosequencing, sequencing-by-ligation, single molecule sequencing, sequencing-by-synthesis (SBS), massively parallel cloning, massively parallel single molecule SBS, massively parallel single molecule real-time nanopore technologies, and the like. An overview of some such techniques can be found in (Morozova and Marra, 2008), which is incorporated herein by reference. Thus, in some embodiments, performing a genetic risk assessment as described herein involves detecting at least two polymorphisms by DNA sequencing. In one embodiment, at least two polymorphisms are detected by next generation sequencing.

Next Generation Sequencing (NGS) methods share the common features of massively parallel, high-throughput strategies, with the goal of lower cost compared to older sequencing methods (see volkerding et al, 2009; MacLean et al, 2009).

Many such DNA sequencing techniques are known in the art, including fluorescence-based sequencing methods (Birren et al, 1997). In some embodiments, automated sequencing techniques are used. In some embodiments, parallel sequencing of partitioned amplicons is used (PCT publication No. WO 2006084132). In some embodiments, DNA sequencing is achieved by parallel oligonucleotide extension (see, e.g., US 5,750,341 and US6,306,597). Other examples of sequencing technologies include Churchpolony technology (Mitra et al, 2003; Shendare et al, 2005; US6,432, 36; US6,485,944; US6,511,803), 454picotiter pyrosequencing technology (Margulies et al, 2005; US 20050130173), Solexa single base addition technology (Bennett et al, 2005; US6,787,308; US6,833,246), Lynx massively parallel signature sequencing technology (Brenner et al, 2000; US 5,695,934; US 5,714,330), and Adessi PCR colony technology (Adessi et al, 2000). All documents cited above are incorporated herein by reference.

Correlating markers with phenotypes

These associations can be made by any method that can identify the relationship between an allele and a phenotype or a combination of alleles and a phenotype. For example, alleles in a gene or locus as defined herein may be associated with one or more breast cancer phenotypes. Most commonly, these methods involve referencing a look-up table that includes correlations between alleles of polymorphisms and phenotypes. The table may include data for a plurality of allele-phenotype relationships, and may take into account the addition of the plurality of allele-phenotype relationships or other higher order effects, for example, by using statistical tools such as principal component analysis, heuristic algorithms, and the like.

Association of the marker with the phenotype optionally includes performing one or more statistical tests to associate. Many statistical tests are known, most being computer implemented, to facilitate analysis. Various statistical methods of determining the correlation/association between phenotypic traits and biomarkers are known and can be applied to the present disclosure (Hartl et al, 1981). Various suitable statistical models are described in Lynch and Walsh (1998). For example, these models can provide correlations between genotypes and phenotypic values, characterize the impact of loci on phenotypes, correlate between environment and genotypes, determine dominance or penetrance of genes, determine maternal and other epigenetic effects, determine major components in the analysis (by principal component analysis or "PCA"), and the like. The references cited in these text documents provide more detail about statistical models of marker and phenotypic associations.

In addition to standard statistical methods for determining correlation, other methods of determining correlation by pattern recognition and training, such as the use of genetic algorithms, can be used to determine the correlation between markers and phenotypes. This is particularly useful when identifying higher order correlations between multiple alleles and multiple phenotypes. To illustrate, neural network methods may be coupled to genetic algorithm type programming for heuristically developing structural function data space models that determine correlations between genetic information and phenotypic outcomes.

In any case, essentially any statistical test may be applied in the computer implemented model, by standard programming methods, or using any of a variety of "off-the-shelf" software packages for such statistical analysis, including, for example, those described above, e.g., commercially available from Partek Incorporated (St.Peters, Mo.; www.partek.com), e.g., providing software for pattern recognition (e.g., providing Partek Pro 2000 pattern recognition software).

Further details on association studies can be found in US 10/106,097, US 10/042,819, US 10/286,417, US 10/768,788, US 10/447,685, US 10/970,761 and US 7,127,355.

Systems for making the above associations are also a feature of the present disclosure. Typically, the system will include a system specification that correlates the presence or absence (whether detected directly or, for example, by expression level) of an allele with a predicted phenotype.

Optionally, the system instructions may also include software that accepts diagnostic information associated with any detected allele information, such as a diagnosis that a subject with a related allele has a particular phenotype. The software may be heuristic in nature, using such input associations to improve the accuracy of the look-up table and/or the interpretation of the look-up table by the system. Various such methods are described above, including neural networks, Markov models (Markov modeling), and other statistical analyses.

Polymorphism profiling

The present disclosure provides methods of determining a polymorphism profile analysis of an individual at a polymorphism or polymorphisms in linkage disequilibrium with one or more of the polymorphisms outlined in the present disclosure (e.g., table 6 or table 12).

Polymorphism maps constitute polymorphisms that occupy various polymorphic sites in an individual. In a diploid genome, two polymorphic forms, identical or different from each other, typically occupy each polymorphic site. Thus, the polymorphism maps at positions X and Y may be expressed in the form of X (X1, X1) and Y (Y1, Y2), where X1, X1 indicate that the allele X1 occupies two copies of positions X and Y1, and Y2 indicates a heterozygous allele occupying position Y.

The polymorphism profile of an individual can be assessed by comparison with the polymorphic forms associated with resistance or susceptibility to breast cancer that occur at each site. The comparison can be made at least, e.g., 1, 2, 5, 10, 25, 50 or all of the polymorphic sites, and optionally in other linkage disequilibrium therewith. Polymorphic sites can be analyzed in combination with other polymorphic sites.

Polymorphism profiling is useful, for example, in selecting agents to affect the treatment or prevention of breast cancer in a given individual. Individuals with similar polymorphisms may respond to an agent in a similar manner.

Polymorphism profiling may also be used to stratify individuals in clinical trials tested for agents that are capable of treating breast cancer or related conditions. Such tests are performed on treated or control populations having similar or identical polymorphism profiles (see EP 99965095.5), e.g., polymorphism profiles indicating that an individual has an increased risk of developing breast cancer. The use of genetically matched populations eliminates or reduces variations in treatment outcome due to genetic factors, thereby more accurately assessing the efficacy of potential drugs.

Polymorphism profiles may also be used to exclude individuals who are not predisposed to breast cancer from clinical trials. Inclusion of such individuals in the trial increases the size of the population required to obtain statistically significant results. Individuals who are not predisposed to breast cancer can be identified by determining the number of resistant and susceptible alleles in the polymorphic profile as described above. For example, if a subject is genotyped at 10 of 10 published genes associated with breast cancer, a total of 20 alleles are determined. If more than 50%, or more than 60% or 75% of these are resistance genes, the individual is less likely to suffer from breast cancer and can be excluded from the trial.

In other embodiments, stratification of individuals in a clinical trial may be achieved using polymorphism mapping analysis in combination with other stratification methods including, but not limited to, risk models (e.g., Gail score, Claus model), clinical phenotypes (e.g., atypical lesions, breast density), and specific candidate markers.

Computer implementation method

It is contemplated that the methods of the present disclosure may be implemented by a system, such as a computer-implemented method. For example, the system may be a computer system including one or more processors, which may be operatively connected to a memory (referred to as a "processor" for convenience). The memory may be a non-transitory computer readable medium such as a hard disk drive, a solid state disk, or a CD-ROM. Software, i.e., executable instructions or program code, e.g., program code grouped into code modules, may be stored in the memory and, when executed by the processor, may cause the computer system to perform functions such as determining tasks to assist a user in determining a risk of a human female subject for breast cancer; receiving data representing a first clinical risk assessment, a second clinical risk assessment, and a genetic risk assessment of a female subject with breast cancer, wherein the genetic risk is derived by detecting at least 2 polymorphisms known to be associated with breast cancer; processing the data to combine the first clinical risk assessment, the second clinical risk assessment and the genetic risk assessment to obtain a risk of the human female subject to suffer from breast cancer; outputting a risk of the human female subject to suffer from breast cancer.

For example, the memory may include program code that, when executed by the processor, causes the system to determine at least 2 polymorphisms associated with breast cancer; processing the data to combine the first clinical risk assessment, the second clinical risk assessment and the genetic risk assessment to obtain a risk of the human female subject to suffer from breast cancer; reporting the risk of breast cancer in a human female subject.

In another embodiment, the system may be connected to a user interface to enable the system to receive information from a user and/or to output or display information. For example, the user interface may include a graphical user interface, a voice user interface, or a touch screen.

In one embodiment, the program code can cause the system to determine a "polymorphism risk".

In one embodiment, the program code may cause the system to determine the combined first clinical risk x second clinical risk x genetic risk (e.g., polymorphism risk).

In one embodiment, the system may be configured to communicate with at least one remote device or server over a communication network, such as a wireless communication network. For example, the system may be configured to receive information from a device or server over a communication network and to transmit information to the same or a different device or server over the communication network. In other embodiments, the system may be isolated from direct user interaction.

In another embodiment, performing the methods of the present disclosure to assess the risk of a human female subject for breast cancer enables the establishment of a diagnostic or prognostic rule based on the first clinical risk assessment, the second clinical risk assessment, and the genetic risk assessment of a female subject for breast cancer. For example, the diagnostic or prognostic rules can be based on a combined first clinical risk x second clinical risk x genetic risk score relative to a control, standard or threshold level of risk.

In one embodiment, the threshold level of risk is a level recommended by the American Cancer Society (ACS) guidelines for screening breast MRIc and mammography. In this embodiment, the threshold level is preferably greater than about (20% lifetime risk).

In another embodiment, the threshold level of risk is a level recommended by the American Society of Clinical Oncology (ASCO) for providing estrogen receptor therapy to reduce the risk of the subject. In this embodiment, the threshold level of risk is preferably (GAIL index at risk of 5 years > 1.66%).

In another embodiment, the diagnostic or prognostic rules are based on the application of statistical and machine learning algorithms. This algorithm uses the relationships between the polymorphic populations and disease states observed in training data (known disease states) to infer relationships, which are then used to determine the risk of a human female subject to develop breast cancer in a subject at unknown risk. An algorithm is used that provides a risk of a human female subject to suffer from breast cancer. The algorithm performs a multivariate or univariate analysis function.

Polymorphisms indicative of breast cancer risk

Examples of polymorphisms indicative of breast cancer risk are shown in table 6 and table 12. 77 polymorphisms were informative in caucasians, 78 polymorphisms were informative in african-americans, and 82 polymorphisms were informative in hispanic. The 70 polymorphisms were informative in caucasians, african-americans and hispanic (indicated by horizontal striped patterns; see also table 7). The remaining 18 polymorphisms (see table 8) are informative in caucasians (represented by the dark grid pattern; see also table 9), african-americans (represented by the diagonal downward striped pattern; see also table 10), and/or hispanic (represented by the grid pattern; see also table 11). Tables 13 and 14 show the optimized polymorphism lists with informativeness in african-american and hispanic, respectively.

Table 6. polymorphisms indicating breast cancer risk (n ═ 88)

Figure BDA0002539373160000441

TABLE 7. polymorphisms common to Caucasian, African-American, and Hispanic groups (n-70)

Figure BDA0002539373160000471

TABLE 8 polymorphisms not shared by Caucasian, African-American, and Hispanic groups (n-18)

Figure BDA0002539373160000472

Figure BDA0002539373160000481

Illustration of the drawings

Figure BDA0002539373160000482

Table 9 caucasian polymorphisms (n ═ 77). Alleles are denoted as major/minor (e.g., rs 616488A is a common allele and G is less common). An OR minor allele factor of less than 1 means that the minor allele is not a risk allele, and when greater than 1, the minor allele is a risk allele.

Figure BDA0002539373160000483

Figure BDA0002539373160000491

Figure BDA0002539373160000501

Figure BDA0002539373160000511

Table 10. african american polymorphisms (n-78). The allele is expressed as risk/reference (non-risk) (e.g., rs 616488A is a risk allele).

Figure BDA0002539373160000541

Table 11 hispanic polymorphism (n-82). Alleles are denoted as major/minor (e.g., rs 616488A is a common allele and G is less common). An OR minor allele factor of less than 1 means that the minor allele is not a risk allele, and when greater than 1, the minor allele is a risk allele.

Figure BDA0002539373160000581

Table 12. extended polymorphism list indicating breast cancer risk (n 203).

"x" indicates a deletion of 21 base pairs, "x" indicates a deletion of 36 base pairs, "x" indicates an insertion of 14 base pairs, "x" indicates a deletion of 31 kb. An "odds ratio minor allele" value below 1 means that the minor allele is not a risk allele, and when greater than 1, means that the minor allele is a risk allele.

Figure BDA0002539373160000611

Figure BDA0002539373160000631

Figure BDA0002539373160000661

Table 13. optimized african american polymorphism list (n-74).

Figure BDA0002539373160000662

Table 14. optimized spanish polymorphism list (n-71).

Figure BDA0002539373160000691

Figure BDA0002539373160000711

70页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:铬鞣剂

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!