System for automated in situ hybridization analysis

文档序号：884218 发布日期：2021-03-19 浏览：12次中文

阅读说明：本技术 用于自动化原位杂交分析的系统 (System for automated in situ hybridization analysis ) 是由 S·楚卡 T·M·格罗根 A·萨卡于 2019-07-10 设计创作，主要内容包括：本公开提供了用于自动分析针对蛋白质和/或核酸生物标记物的存在进行染色的生物学样品的数字图像(311)并自动检测和量化对应于一种或多种生物标记物的信号(314)的图像处理系统和方法。本公开还提供了用于临床解释双ISH载片的系统和方法,所述载片中待评分的细胞被自动选择(例如通过使用一种或多种细胞检测和识别算法(204))。据信通过自动检测、识别和选择细胞进行评定,可减少或消除主观性。与人工点计数方法相比,所述自动化系统和方法还允许增加被考虑进行评分的细胞的数量,从而提高检测灵敏度,最终使患者的护理和治疗得到改善。(The present disclosure provides image processing systems and methods for automatically analyzing digital images (311) of biological samples stained for the presence of protein and/or nucleic acid biomarkers and automatically detecting and quantifying signals (314) corresponding to one or more biomarkers. The present disclosure also provides systems and methods for clinical interpretation of dual ISH slides in which cells to be scored are automatically selected (e.g., by using one or more cell detection and identification algorithms (204)). It is believed that subjectivity can be reduced or eliminated by automatically detecting, identifying, and selecting cells for evaluation. The automated system and method also allows for an increase in the number of cells considered for scoring as compared to manual point counting methods, thereby increasing detection sensitivity, ultimately resulting in improved care and treatment of the patient.)

1. A system for assessing genetic aberrations in an image of a biological sample stained for the presence of at least one nucleic acid biomarker, the system comprising: (i) one or more processors, and (ii) one or more memories coupled with the one or more processors, the memories for storing computer-executable instructions that, when executed by the one or more processors, cause the system to perform operations comprising:

(a) running a detection algorithm (204) to automatically detect and identify cells (411) in the first image stained for the presence of the at least one protein biomarker that meet a predetermined protein biomarker staining criterion;

(b) deriving a tumor tissue region in the first image encompassing identified cells that meet the predetermined protein biomarker staining criteria (412);

(c) performing an automatic registration of the first and second images with a common coordinate system such that the derived tumor tissue region in the first image is mapped to the second image to provide a mapped tumor tissue region, wherein the second image comprises a signal (413) corresponding to the presence of at least one nucleic acid biomarker;

(d) automatically identifying points (414) within the mapped tumor tissue region corresponding to the signal from the at least one nucleic acid biomarker; and

(e) assessing whether tumor nuclei in the mapped tumor tissue region in the second image have a genetic aberration based on the identified points (415).

2. The system of claim 1, wherein the genetic aberration of the cell nuclei is assessed by automatically determining whether a total number of identified points in each cell nucleus corresponding to the signal from the at least one nucleic acid biomarker meets a predetermined threshold.

3. The system of any one of the preceding claims, wherein the genetic aberration of the cell nucleus is assessed by: (i) calculating, for each cell nucleus, a ratio of a first identified point corresponding to a first nucleic acid biomarker to a second identified point corresponding to a second nucleic acid biomarker; and (ii) comparing the calculated ratio for each nucleus to a predetermined threshold.

4. The system of claim 3, wherein the first nucleic acid biomarker is HER2 and the second nucleic acid biomarker is chromosome 17, and wherein the at least one protein biomarker is a HER2 protein biomarker.

5. The system of claim 3, wherein the first nucleic acid biomarker is EGFR and the second nucleic acid biomarker is chromosome 7, and wherein the at least one protein biomarker is an EGFR protein biomarker.

6. The system of claim 3, further comprising associating a total number of points associated with the first stain and a total number of points associated with the second stain with the cell nucleus location information.

7. The system of claim 3, further comprising assigning a first index to those rated cell nuclei for which the calculated ratio is above the predetermined threshold and assigning a second index to those rated cell nuclei for which the calculated ratio is equal to or below the predetermined threshold.

8. The system of claim 7, further comprising generating an overlay image based on the assigned first and second indices.

9. The system of claim 3, further comprising generating a group histogram of the calculated ratios of all rated nuclei in each mapped tissue region.

10. The system of claim 9, further comprising an operation for identifying a set of histograms having a maximum count.

11. The system of claim 9, further comprising operations for determining a course of treatment based on data from the generated grouped histogram.

12. The system of claim 3, further comprising ranking the assessed nuclei according to a calculated ratio of each nucleus.

13. The system of any one of the preceding claims, wherein the predetermined protein biomarker staining criterion is a staining intensity threshold.

14. The system of claim 13, wherein the staining intensity threshold is a cutoff value for the presence of membrane staining.

15. The system of any one of the preceding claims, wherein the genetic aberration comprises an abnormal gene copy number.

16. The system of claim 15, wherein the abnormal gene copy number is a copy number that is greater than a normal copy number of the gene.

17. The system of any one of the preceding claims, wherein the genetic aberration comprises a chromosomal abnormality.

18. The system of any of the preceding claims, wherein the number of assessed nuclei in each mapped tissue region is greater than 20.

19. A method of assessing genetic aberrations in an image of a biological sample stained for the presence of at least one nucleic acid biomarker, the method comprising:

(a) automatically detecting cells in the first image stained for the presence of the at least one protein biomarker that meet a predetermined protein biomarker staining criterion (411);

(b) deriving a tumor tissue region in the first image encompassing identified cells that meet the predetermined protein biomarker staining criteria (412);

(c) automatically registering the first and second images with a common coordinate system such that the derived tumor tissue region in the first image is mapped to the second image to provide a mapped tissue region, wherein the second image comprises a signal corresponding to the presence of at least one nucleic acid biomarker (413);

(d) automatically identifying points within the mapped tissue region corresponding to signals from the at least one nucleic acid biomarker (414); and

(e) assessing whether tumor nuclei in the mapped tissue region in the second image have a genetic aberration based on the identified points corresponding to the at least one nucleic acid biomarker (415).

20. The method of claim 19, wherein the genetic aberration of the nucleus is assessed by: (i) calculating, for each cell nucleus, a ratio of a first identified point corresponding to a first nucleic acid biomarker to a second identified point corresponding to a second nucleic acid biomarker; and (ii) comparing the calculated ratio for each nucleus to a predetermined threshold.

21. The method of claim 20, further comprising assigning a first index to those tumor nuclei for which the calculated ratio is above the predetermined threshold and assigning a second index to those tumor nuclei for which the calculated ratio is equal to or below the predetermined threshold.

22. The method of claim 21, further comprising generating an overlay image based on the assigned first and second indices.

23. The method of claim 20, further comprising generating a group histogram of the calculated ratios of all identified nuclei.

24. The method of claim 23, further comprising ranking the assessed nuclei according to a calculated ratio for each nucleus.

25. The method of any one of claims 19-24, wherein the biological sample is stained for the presence of the HER2 and chromosome 17 nucleic acid biomarkers.

26. The method of claim 25, wherein the point is identified by:

(a) automatically detecting points in the mapped tissue region that meet absorbance intensity, black unmixed image channel intensity, red unmixed image channel intensity, and a Gaussian difference threshold criterion; and

(b) automatically classifying said detected spot as belonging to a black nucleic acid biomarker signal corresponding to HER2 or a red nucleic acid biomarker signal corresponding to chromosome 17.

27. The method of claim 26, wherein the tumor cell nucleus is assessed by: (i) calculating a ratio of those classified points that belong to the black nucleic acid biomarker signal to those classified points that belong to the red nucleic acid biomarker signal; and (ii) comparing the calculated ratio to a predetermined threshold.

28. The method of claim 27, wherein the at least one protein biomarker is a HER2 protein biomarker.

29. The method of claim 28, further comprising identifying whether a patient is positive or negative for HER2 based on said assessed tumor cell nuclei.

30. The method of claim 28, further comprising scoring the biological sample for the presence of at least one additional protein biomarker.

31. The method of claim 30, wherein the at least one additional protein biomarker is EGFR.

32. The method of any one of claims 19-31, wherein the genetic aberration is RNA overexpression.

33. A non-transitory computer-readable medium storing instructions for assessing genetic aberrations in a biological sample stained for the presence of at least one nucleic acid biomarker, the instructions comprising:

(b) deriving a tumor tissue region in the first image encompassing identified cells that meet the predetermined protein biomarker staining criteria (412);

(c) performing an automatic registration of the first and second images with a common coordinate system such that the derived tumor tissue region in the first image is mapped to the second image to provide a mapped tissue region, wherein the second image comprises a signal (413) corresponding to the presence of at least one nucleic acid biomarker;

(d) automatically detecting a point within the mapped tissue region corresponding to a signal from the at least one nucleic acid biomarker (414);

(e) counting all detected points within each tumor cell nucleus within each mapped tissue region; and

(f) each tumor cell nucleus in each mapped region is assessed for genetic aberration based on the total number of points counted in each cell nucleus (415).

34. The non-transitory computer-readable medium of claim 33, wherein the biological sample is stained for the presence of at least two nucleic acid biomarkers, and wherein spots corresponding to each of the at least two nucleic acid biomarkers are detected and counted.

35. The non-transitory computer readable medium of claim 34, wherein the tumor cell nucleus is assessed by: (i) calculating a ratio of the first point of the count to the second point of the count; and (ii) comparing the calculated ratio to a clinically relevant threshold.

36. The non-transitory computer-readable medium of claim 35, further comprising instructions for generating an image overlay, wherein (i) is at or below the clinically relevant threshold based on a calculated ratio; or (ii) assigning a color to each assessed nucleus above the clinically relevant threshold.

37. The non-transitory computer readable medium of claim 35, further comprising instructions for ranking the rated nuclei in each mapped tissue region according to the calculated ratio.

38. The non-transitory computer-readable medium of claim 35, further comprising instructions for generating a bin histogram of the calculated ratio.

39. The non-transitory computer readable medium of any one of claims 33-38, wherein the predetermined protein biomarker staining criterion is a staining intensity threshold.

40. The non-transitory computer readable medium of any one of claims 33-39, wherein the genetic aberration is selected from the group consisting of an abnormal gene copy number and a chromosomal abnormality.

Technical Field

The present disclosure provides systems and methods for detecting and classifying signals in a stained image of a biological sample.

Background

Digital pathology is the scanning of an entire histopathological or cytopathological slide into a digital image that can be interpreted on a computer screen. These images are then processed by imaging algorithms or interpreted by a pathologist. To examine the tissue section (almost transparent), the tissue section is prepared with a colored stain that selectively binds with cellular components. Clinicians or computer-aided diagnosis (CAD) algorithms use color enhanced or stained cellular structures to identify morphological markers of disease and guide corresponding therapies. Various processes can be accomplished by assay observation, including diagnosing disease, assessing response to treatment, and developing new anti-disease drugs.

Immunohistochemical (IHC) slide staining can be used to identify proteins in cells of tissue sections and is therefore widely used to study different types of cells, such as cancer cells and immune cells in biological tissues. Therefore, IHC staining can be used to study the distribution and localization of biomarkers differentially expressed by immune cells (e.g., T cells or B cells) in cancer tissues for immune response studies. For example, tumors often contain infiltrates of immune cells, which may prevent the development of the tumor or promote tumor growth.

In Situ Hybridization (ISH) can be used to determine the presence or absence of genetic abnormalities or specific amplification of oncogenes in cells that are morphologically malignant when observed under a microscope. The unique nucleic acid sequences are precisely located in chromosomes, cells and tissues, and in situ hybridization allows the presence, absence and/or amplification status of these sequences to be determined without severe fragmentation of the sequences. ISH employs labeled DNA or RNA probe molecules antisense to target gene sequences or transcripts to detect or locate targeted nucleic acid target genes within a cell or tissue sample. ISH is accomplished by exposing a cell or tissue sample immobilized on a glass slide to a labeled nucleic acid probe that is capable of specifically hybridizing to a given target gene in the cell or tissue sample. Multiple target genes can be analyzed simultaneously by exposing a cell or tissue sample to multiple nucleic acid probes that have been labeled with multiple different nucleic acid tags. With labels having different emission wavelengths, multicolor analysis can be performed simultaneously in a single step on a single target cell or tissue sample.

Disclosure of Invention

It is believed that IHC and ISH may target different molecules (e.g., biomarkers), where one molecule may be a precursor to another. Likewise, administering ISH and IHC together (e.g., simultaneously or sequentially) may provide complementary information to determine the source of secreted proteins, determine complex tissue architecture, determine regulation of gene expression, and/or assess therapy. In view of the foregoing, in some embodiments, the present disclosure provides systems and methods for detecting gene aberrations (e.g., high copy number, chromosomal abnormalities, etc.) in cells that are selected for assessment (e.g., autoselect assessment). In some embodiments, the cells automatically selected for assessment are located within a tumor tissue region that includes cells that meet a predetermined protein biomarker staining criterion. It is believed that in an automated cell selection process, any subjectivity introduced during manual selection of cells may be reduced and/or eliminated. In addition, the systems and methods of the present disclosure allow for the use of a greater number of cells to detect genetic aberrations, thereby facilitating more stable assessments, and ultimately providing improved patient care and therapy.

In one aspect of the present disclosure is a system for assessing genetic aberrations in an image of a biological sample (e.g., a sample stained for the presence of at least one nucleic acid biomarker and/or protein biomarker), the system comprising: (i) one or more processors, and (ii) one or more memories coupled with the one or more processors, the one or more memories storing computer-executable instructions that, when executed by the one or more processors, cause the system to perform operations comprising: (a) identifying cells in the first image that have been stained for the presence of at least one protein biomarker (e.g., a HER2 protein biomarker) that meet a protein biomarker staining criterion; (b) deriving a tumor tissue region (e.g., an epithelial tumor tissue region) in the first image encompassing identified cells that meet the predetermined protein biomarker staining criteria; (c) registering the first and second images with a common coordinate system such that the derived tumor tissue region in the first image is mapped to the second image to provide a mapped tumor tissue region, wherein the second image comprises signals corresponding to the presence of at least one nucleic acid biomarker (such as HER2 and chromosome 17 nucleic acid biomarkers); (d) identifying a point within the mapped tumor tissue region corresponding to the signal from the at least one nucleic acid biomarker; and (e) assessing whether tumor nuclei in the mapped tumor tissue region in the second image have genetic aberrations (e.g., gene copy number) based on the identified points. In some embodiments, the number of nuclei assessed in each mapped tissue region is greater than 20.

In some embodiments, the genetic aberration of each cell nucleus is assessed by determining whether the total number of identified points in the cell nucleus corresponding to signals from at least one nucleic acid biomarker meets a predetermined threshold (e.g., analyzing a single nucleic acid biomarker). In other embodiments, the genetic aberration of the nucleus is assessed by: (i) calculating a ratio of a first identified point in each cell nucleus corresponding to a signal from a first nucleic acid biomarker to a second identified point in each cell nucleus corresponding to a second nucleic acid biomarker; and (ii) comparing the calculated ratio for each nucleus to a predetermined threshold. In some embodiments, the first nucleic acid biomarker is HER2 and the second nucleic acid biomarker is chromosome 17; and wherein the at least one protein biomarker is a HER2 protein biomarker. In some embodiments, the first nucleic acid biomarker is EGFR, and the second nucleic acid biomarker is chromosome 7; and wherein the at least one protein biomarker is an EGFR protein biomarker. One skilled in the art will recognize that other protein biomarkers and nucleic acid biomarkers, including biomarkers that are precursors of each other, may be utilized.

In some embodiments, the system further comprises assigning a first indicator (e.g., a first color, or a first symbol) to those rated cell nuclei whose calculated ratio is above a predetermined threshold, and assigning a second indicator (e.g., a second color, or a second symbol) to those rated cell nuclei whose calculated ratio is at or below the predetermined threshold. In some embodiments, the system further comprises generating an overlay image based on the assigned first indicator and the assigned second indicator. In some embodiments, the generated overlay image is superimposed on the entire slice image or a portion thereof. In some embodiments, the overlapping image is a foreground segmentation mask.

In some embodiments, the system further comprises sorting the rated nuclei according to the calculated ratio of each nucleus (e.g., may be sorted in a table format, wherein calculated ratios are sorted in descending order, and the table optionally includes location information, such as coordinates of nuclei or cells within the image). In some embodiments, the system further comprises generating a group histogram of all the rated cell nucleus calculated ratios. In some embodiments, a respective grouping histogram is generated for each mapped tissue region. In some embodiments, the system further comprises identifying the set of histograms having the largest count. In some embodiments, the system further comprises determining a course of treatment (e.g., whether to administer targeted therapy; whether to administer combination therapy) based on data from the generated grouped histograms.

In some embodiments, the predetermined protein biomarker staining criterion is a staining intensity threshold. In some embodiments, the staining intensity threshold is a cutoff value for the presence of membrane staining. In some embodiments, the predetermined biomarker staining criterion is an expression score calculated for the cell. In some embodiments, the genetic aberration refers to an abnormal gene copy number. In some embodiments, the abnormal gene copy number is a copy number that is greater than the normal copy number of the gene. In some embodiments, the genetic aberration is a chromosomal abnormality. In some embodiments, the genetic aberration refers to RNA overexpression.

Another aspect of the present disclosure is a method of assessing genetic aberrations in an image of a biological sample stained for the presence of at least one nucleic acid biomarker, the method comprising: detecting cells in the first image stained for the presence of at least one protein biomarker that meet a predetermined protein biomarker staining criterion; deriving a tumor tissue region in the first image encompassing identified cells that meet the predetermined protein biomarker staining criteria; registering the first and second images with a common coordinate system such that the derived tumor tissue region in the first image is mapped to the second image to provide a mapped tissue region, wherein the second image comprises a signal corresponding to the presence of at least one nucleic acid biomarker; identifying a point within the mapped tissue region corresponding to the signal from the at least one nucleic acid biomarker; and assessing whether tumor nuclei in the mapped tissue region in the second image have a genetic aberration based on the identified points corresponding to the signals from the at least one nucleic acid biomarker. In some embodiments, the genetic aberration refers to RNA overexpression. In some embodiments, the genetic aberration refers to an abnormal gene copy number. In some embodiments, the abnormal gene copy number is a copy number that is greater than the normal copy number of the gene.

In some embodiments, the genetic aberration of the tumor cell nucleus is assessed by: (i) for each cell nucleus, calculating a ratio of the identified point corresponding to the signal from the first nucleic acid biomarker to the identified point corresponding to the signal from the second nucleic acid biomarker; and (ii) comparing the calculated ratio for each nucleus to a predetermined threshold. In some embodiments, the method further comprises assigning a first index to those tumor nuclei whose calculated ratio is above a predetermined threshold and assigning a second index to those tumor nuclei whose calculated ratio is equal to or below the predetermined threshold. In some embodiments, the method further comprises generating an overlay image based on the assigned first and second indices. In some embodiments, the method further comprises a group histogram of the calculated ratios of all identified nuclei. In some embodiments, the method further comprises ranking the assessed tumor nuclei according to the calculated ratio for each nucleus.

In some embodiments, the biological sample is stained for the presence of HER2 and chromosome 17 nucleic acid biomarkers. In embodiments in which the biological sample is stained for the presence of HER2 and chromosome 17 nucleic acid biomarkers, the method comprises detecting points in the mapped tissue region that meet absorbance intensity, black promiscuous image channel intensity, red promiscuous image channel intensity, and gaussian threshold difference criteria; and classifying said detected spots as belonging to a black nucleic acid biomarker signal corresponding to HER2 or to a red nucleic acid biomarker signal corresponding to chromosome 17. In the example of staining the biological sample for the presence of HER2 and chromosome 17 nucleic acid biomarkers, the tumor cell nuclei were assessed by: (i) calculating a ratio of points belonging to those classifications of black nucleic acid biomarker signals to points belonging to those classifications of red nucleic acid biomarker signals; and (ii) comparing the calculated ratio to a predetermined threshold. In embodiments where the biological sample is stained for the presence of HER2 and chromosome 17 nucleic acid biomarkers, the at least one protein biomarker is a HER2 protein biomarker. In embodiments where the biological sample is stained for the presence of HER2 and chromosome 17 nucleic acid biomarkers, the method further comprises identifying whether the patient is positive or negative for HER2 based on the assessed tumor cell nuclei. In embodiments where the biological sample is stained for the presence of HER2 and chromosome 17 nucleic acid biomarkers, the method further comprises scoring the biological sample for the presence of at least one additional protein biomarker. In some embodiments, the at least one additional protein biomarker is selected from the group consisting of EGFR.

Another aspect of the disclosure is a non-transitory computer-readable medium storing instructions for assessing genetic aberrations in a biological sample stained for the presence of at least one nucleic acid biomarker, the instructions comprising: identifying cells in the first image that have been stained for the presence of at least one protein biomarker that meet a predetermined protein biomarker staining criterion; deriving a tumor tissue region in the first image encompassing identified cells that meet a predetermined protein biomarker staining criterion; registering the first and second images with a common coordinate system such that a derived tumor tissue region in the first image is mapped to the second image to provide a mapped tissue region, wherein the second image comprises a signal corresponding to the presence of at least one nucleic acid biomarker; detecting a point within the mapped tissue region corresponding to a signal from at least one nucleic acid biomarker; counting all detected points within each tumor cell nucleus in each mapped tissue region; and assessing whether each tumor cell nucleus in each mapped region has a genetic aberration based on the total number of counted points in each cell nucleus. In some embodiments, the predetermined protein biomarker staining criterion is a staining intensity threshold. In some embodiments, the genetic aberration is selected from the group consisting of an abnormal gene copy number and a chromosomal abnormality.

In some embodiments, the biological sample is stained for the presence of at least two nucleic acid biomarkers, and wherein the spot corresponding to each of the at least two nucleic acid biomarkers is detected and counted for each cell nucleus. In some embodiments, tumor nuclei are assessed by: (i) calculating a ratio of the first point of the count to the second point of the count; and (ii) comparing the calculated ratio to a clinically relevant threshold.

In some embodiments, the non-transitory computer-readable medium further comprises instructions for generating an image overlay, wherein each assessed cell nucleus is (i) at or below a clinically relevant threshold based on the calculated ratio; or (ii) assign a color above a clinically relevant threshold. In some embodiments, the non-transitory computer readable medium further comprises instructions for ranking the rated nuclei in each mapped tissue region according to a calculated ratio. In some embodiments, the non-transitory computer-readable medium further comprises instructions for generating a grouping histogram of the calculated ratio.

Drawings

For a general understanding of the features of the present disclosure, refer to the accompanying drawings. In the drawings, like reference numerals are used to identify like elements throughout the figures.

Fig. 1 illustrates a representative digital pathology system including an image acquisition device and a computer system, according to some embodiments.

Fig. 2 lists various modules that may be used in a digital pathology system or in a digital pathology workflow, according to some embodiments.

Fig. 3 sets forth a flow chart illustrating a method of detecting genetic aberrations within a biological sample according to some embodiments of the present disclosure.

Fig. 4 sets forth a flow chart illustrating a method of detecting genetic aberrations within a biological sample according to some embodiments of the present disclosure.

Figure 5 sets forth a flow chart illustrating a method of predicting HER2 status according to some embodiments of the present disclosure.

Fig. 6 sets forth a flow chart illustrating steps for registering one or more images with a common coordinate system according to some embodiments of the present disclosure.

Figure 7A shows the first slide stained for the presence of HER2 protein biomarker.

Figure 7B shows a second slide stained for the presence of HER2 nucleic acid biomarker and chromosome 17 nucleic acid biomarker.

Figure 8A shows slides stained for the presence of HER2 protein biomarker.

Fig. 8B also shows the image analysis results of the membrane staining features.

Figure 9 shows a global view of samples stained for the presence of HER2 protein biomarkers before (lower) and after (upper) HER2 image analysis.

Figure 10 shows a global view of a sample stained for the presence of HER2 nucleic acid biomarker and chromosome 17 nucleic acid biomarker (below), and a sample stained for the generation of foreground segmentation mask (above) after cell detection and classification.

Figure 11 shows the results of spot detection of the HER2 dual ISH assay (upper: red spots detected; lower: black spots detected).

Figure 12 shows a workflow for testing HER2 protein biomarkers.

Detailed Description

It will also be understood that, unless indicated to the contrary, in any methods claimed herein that include more than one step or action, the order of the steps or actions of the method need not be limited to the order in which the steps or actions of the method are expressed.

As used herein, the singular forms "a", "an" and "the" include plural referents unless the context clearly dictates otherwise. Likewise, the word "or" is intended to include "and" unless the context clearly indicates otherwise. The term "comprising" is defined as inclusive, e.g., "comprising A or B" means including A, B or A and B.

As used herein in the specification and claims, "or" should be understood to have the same meaning as "and/or" as defined above. For example, where items in a list are separated by "or" and/or "should be interpreted as inclusive, i.e., including at least one element from the list of elements or elements, but also including more than one element, and optionally including additional unlisted items. To the contrary, terms such as "only one" or "exactly one," or "consisting of," as used in the claims, are intended to mean that there is exactly one element from a number or list of elements. In general, the use of the term "or" only preceded by an exclusive term, such as "or", "one of", "only one of", or "exactly one", should be construed to mean an exclusive alternative (i.e., "one or the other, but not both"). The term "consisting essentially of as used in the claims shall have the ordinary meaning as used in the patent law.

The terms "comprising," "including," "having," and the like are used interchangeably and are intended to be synonymous. Also, "including," "comprising," "having," and the like are used interchangeably and are intended to be equivalent. In particular, each term is defined consistent with the common U.S. patent statutes defining "including", such that each term is to be interpreted as an open-ended term in the sense of "at least the following", and also in a sense that it is not to be interpreted as excluding additional features, limitations, aspects, and the like. Thus, for example, a "device having components a, b, and c" means that the device includes at least components a, b, and c. Also, the phrase: by "a method involving steps a, b and c" is meant that the method comprises at least steps a, b and c. Further, although the steps and processes may be summarized herein in a particular order, those skilled in the art will recognize that the sequential steps and processes may vary.

As used herein in the specification and in the claims, with respect to a list of one or more elements, the phrase "at least one" should be understood to mean at least one element selected from any one or more of the elements in the list of elements, but not necessarily including at least one of each element specifically listed in the list of elements, nor excluding any combination of elements in the list of elements. This definition allows that, in addition to the elements specifically identified in the list of elements to which the phrase "at least one" refers, other elements are optionally present, whether related or not to the specifically identified elements. Thus, as a non-limiting example, "at least one of a and B" (or, equivalently, "at least one of a or B," or, equivalently, "at least one of a and/or B") can refer, in one embodiment, to at least one that optionally includes more than one a, but no B (and optionally includes elements other than B); in another embodiment, at least one is optionally comprised of more than one B, but no A (and optionally includes elements other than A); in yet another embodiment, at least one of the elements selectively includes more than one a, and at least one of the elements selectively includes more than one B (and optionally includes other elements), and so on.

As used herein, the term "biological sample," "tissue sample," "specimen," or similar terms refer to any sample, including biomolecules (e.g., proteins, peptides, nucleic acids, lipids, carbohydrates, or combinations thereof) obtained from any organism, including viruses. Examples of other organisms include mammals (e.g., humans; veterinary animals such as cats, dogs, horses, cows, and pigs; and laboratory animals such as mice, rats, and primates), insects, annelid animals, arachnids, marsupials, reptiles, amphibians, bacteria, and fungi. Biological samples include tissue samples (e.g., tissue sections and needle biopsies of tissue), cellular samples (e.g., cytological smears, such as cervical smears or blood smears or obtained by microdissection), or cellular fractions, debris, or organelles (e.g., obtained by lysing cells and separating their components by centrifugation or other means). Other examples of biological samples include blood, serum, urine, semen, stool, cerebrospinal fluid, interstitial fluid, mucus, tears, sweat, pus, biopsy tissue (e.g., obtained by surgical biopsy or needle biopsy), nipple aspirate, cerumen, breast milk, vaginal secretion, saliva, a swab (e.g., a buccal swab), or any material that contains a biomolecule and is derived from a first biological sample. In certain embodiments, the term "biological sample" as used herein refers to a sample prepared from a tumor or a portion thereof obtained from a subject (e.g., a homogenized or liquefied sample).

"blobs" or "points" as used herein are regions of a digital image where some property is constant or nearly constant; in a sense, all pixels in a blob can be considered similar to each other. Depending on the particular application and the in situ signal to be detected, a "spot" typically comprises 5-60 pixels, whereby, for example, one pixel may correspond to a tissue section of about 0.25 microns by 0.25 microns.

As used herein, the phrase "dual in situ hybridization" or "DISH" refers to an In Situ Hybridization (ISH) method that uses two probes to detect two different target sequences. Typically, the labels of the two probes are different. In some embodiments, the DISH may be an assay for determining the amplification status of the HER2 gene by contacting a tumor sample with a HER 2-specific probe and a chromosome 17 centromere probe, and determining the ratio of HER2 genomic DNA to chromosome 17 centromere DNA (e.g., the ratio of HER2 gene copy number to chromosome 17 centromere copy number). The method comprises the use of a different detectable label and/or detection system for each of HER2 genomic DNA and chromosome 17 centromeric DNA such that each can be detected individually and visibly in a single sample.

As used herein, the term "EGFR" refers to the epidermal growth factor receptor, which is a member of the ErbB receptor family, is a subfamily of four closely related receptor tyrosine kinases: EGFR (ErbB-1), HER2/neu (ErbB-2), Her 3(ErbB-3) and Her 4(ErbB-4)

As used herein, the term "image data" encompasses raw image data acquired from a biological tissue sample, such as by an optical sensor or sensor array, or pre-processed image data. In particular, the image data may comprise a matrix of pixels.

As used herein, the terms "image," "image scan," or "scanned image" encompass raw image data acquired from a biological tissue sample, such as by an optical sensor or sensor array, or pre-processed image data. In particular, the image data may comprise a matrix of pixels.

As used herein, the term "multichannel image" or "multiplexed image" encompasses a digital image obtained from a biological tissue sample in which different biological structures, such as nuclei and tissue structures, are stained simultaneously using specific fluorescent dyes, quantum dots, chromosomes, or the like, each of which fluoresces or is otherwise detectable in different spectral bands, thereby constituting one of the channels in the multichannel image.

As used herein, the term "probe" or "oligonucleotide probe" refers to a nucleic acid molecule for detecting a complementary nucleic acid target gene.

As used herein, the term "slide" refers to any substrate of any suitable size (e.g., a substrate made in whole or in part of glass, quartz, plastic, silicon, etc.) upon which a biological specimen can be placed for analysis, and more particularly to a standard 3x 1 inch microscope slide or a standard 75mm x25mm microscope slide. Examples of biological specimens that may be placed on slides include, but are not limited to, cytological smears, thin tissue sections (e.g., from a biopsy), and biological specimen arrays, such as tissue arrays, cell arrays, DNA arrays, RNA arrays, protein arrays, or any combination thereof. Thus, in one embodiment, tissue sections, DNA samples, RNA samples, and/or proteins are placed on specific locations of the slide. In some embodiments, the term "slide" can refer to SELDI and MALDI chips, as well as silicon wafers.

As used herein, the term "specific binding entity" refers to a member of a specific binding pair. A specific binding pair is a pair of molecules characterized by binding to each other to substantially exclude binding to other molecules (e.g., the binding constant of a specific binding pair can be at least 10^3M-1, 10^4M-1, or 10^5M-1 greater than the binding constant of either member of a binding pair for other molecules in a biological sample). Specific examples of specific binding moieties include specific binding proteins (e.g., avidin, such as antibodies, lectins, streptavidin, and protein a). Specific binding moieties may also include molecules (or portions thereof) that are specifically bound by such specific binding proteins.

As used herein, the term "stain," "stain," or similar term generally refers to any treatment of a biological specimen that detects and/or distinguishes the presence, location, and/or amount (e.g., concentration) of a particular molecule (e.g., lipid, protein, or nucleic acid) or a particular structure (e.g., normal or malignant cells, cytoplasm, nucleus, golgi body, or cytoskeleton) in the biological specimen. For example, staining may align specific molecules or specific cellular structures of a biological specimen with surrounding parts, and the intensity of staining may determine the amount of a specific molecule in the specimen. Staining may be used not only with bright field microscopes, but also with other viewing tools such as phase contrast microscopes, electron microscopes, and fluorescence microscopes for aiding in the observation of molecules, cellular structures, and organisms. Some staining by the system may allow the outline of the cells to be clearly visible. Other staining by the system may rely on specific cellular components (e.g., molecules or structures) that are stained and do not stain or stain relatively little to other cellular components. Examples of various types of staining methods performed by the system include, but are not limited to, histochemical methods, immunohistochemical methods, and other methods based on intermolecular reactions, including non-covalent binding interactions, such as hybridization reactions between nucleic acid molecules. Specific staining methods include, but are not limited to, primary staining methods (e.g., H & E staining, cervical staining, etc.), enzyme-linked immunohistochemistry methods, and in situ RNA and DNA hybridization methods, such as Fluorescence In Situ Hybridization (FISH).

As used herein, the term "target" refers to any molecule whose presence, location and/or concentration is or can be determined. Examples of target molecules include proteins, nucleic acid sequences, and haptens, such as haptens covalently bound to proteins. Typically, one or more conjugates of specific binding molecules and a detectable label are used to detect the target molecule.

SUMMARY

The present disclosure provides systems and methods for detecting gene aberrations (e.g., high copy number or chromosomal abnormalities) in cells, such as automatically selected cells, for assessment. In some embodiments, the cells automatically selected for assessment are located within a tumor tissue region that includes cells that meet a predetermined protein biomarker staining criterion. Subjectivity can be reduced or eliminated by automatically selecting cells for evaluation. It is believed that the automated systems and methods described herein can improve patient treatment outcome and improve treatment regimen selection because the disclosed systems and methods utilize relatively more data than manual analysis methods.

Although examples herein may refer to a particular tissue and/or the application of a particular stain or detection probe to detect a particular biomarker (and thus a disease), one skilled in the art will recognize that different tissues and different stains/detection probes may be applied to detect different markers and different diseases. For example, while certain examples may refer to quantifying signals corresponding to HER2 and Chr17 nucleic acid biomarkers, the systems and methods described herein may be applied to detecting and quantifying signals from a single nucleic acid probe, any combination of two or more nucleic acid probes, and the like. Indeed, the systems and methods described herein may be applied to determine gene copy number or chromosomal aberrations using any ISH assay or dual ISH assay (including those that utilize chromosomes or fluorophores, or any combination thereof, as markers).

In the context of HER2 status of breast and/or gastric cancer, treatment targeting HER2 has achieved very good clinical results, further illustrating the necessity for accurate determination of HER2 status. Because trastuzumab and lapatinib have relatively specific effects on cancer cells overexpressing HER2, patients are well tolerated with minimal toxic side effects. Therefore, determining HER2 status of breast or gastric cancer is an important step in determining a treatment regimen.

The HER2 protein is expressed in the cell membrane of both normal and neoplastic human breast tissue. The human HER2 gene is located on chromosome 17 and encodes the HER2 protein. Overexpression of HER2 protein, amplification of the HER2 gene, or both, occurs in approximately 15% to 25% of breast cancers, which are thought to be associated with aggressive tumor behavior. The copy number of HER2 gene in breast cancer cell can reach 25-50 at most, and HER2 protein can increase 40-fold to 100-fold at most, thus 200 ten thousand receptors are expressed on the surface of tumor cell. Thus, the difference in HER2 expression between normal tissues and tumors contributes to the definition of HER2 as an ideal therapeutic target.

Traditionally, the pattern and intensity of staining of tissue sections including determination of the integrity of cell membrane staining is examined (see fig. 12). In the context of breast tissue stained for the presence of the HER2 protein, staining completely surrounding the cell membrane was scored as "2 +" or "3 +". Partial, incomplete staining of the cell membrane was scored as "1 +". The most difficult to interpret is the case where it falls on the borderline between the intensity levels "1 +" and "2 +" or where different expression levels are interwoven together. In these cases, an alternative test using ISH (such as HER2 dual ISH) may be used for further explanation.

Visualization was achieved by double ISH staining, where HER2 appears as a black discrete signal (SISH) and Chr17 appears as a Red signal (Red ISH) in the nucleus of normal cells as well as in cancer cells. This strategy allows determining the chromosomal status of the HER2 gene. The copy number of the two probes is counted in the tumor cell nucleus, the ratio of HER2/17 chromosome is reported as a counting result, and the amplification state of HER2 is further determined (the ratio of HER2/Chr17 is more than or equal to 2.0 and is amplification, and the ratio is less than 2.0 and is non-amplification).

In manual treatment, a pathologist visually observes double ISH tissue sections under a microscope and visually selects a group of twenty to forty tumor cells, records the count of double probes (red probe and silver probe/dot) in each cell, calculates the ratio of the sum of silver/black dots to the sum of red dots, and compares the calculated ratio to a threshold value (═ 2.0), thereby classifying the patient tissue sections as double ISH positive or double ISH negative. In the algorithmic workflow, dual ISH tissue sections are digitized using a digital microscope or whole slide scanner (as described herein) and the digital images are reviewed by a pathologist on a software application (e.g., Virtuoso software by Ventana Medical Systems, inc., Tucson, AZ) that can view the whole slide and the image regions are manually selected for analysis. The pathologist digitally labeled twenty or forty cells to calculate the slide score. And (3) performing algorithm analysis on the marked cells by using an image analysis algorithm capable of automatically detecting and outputting the number of red dots and black/silver dots in each cell, and outputting the score of the slide and the double-ISH positive/negative state of the slide by adopting the same scoring criterion as a manual method for further examination and approval of a pathologist. These processes are considered to be relatively time consuming. The present disclosure provides a faster, efficient workflow for determining the positive/negative status of dual ISH in some embodiments. The workflow is able to analyze more cells and/or other structures, enhance analysis, improve patient treatment and outcome compared to manual processes.

In view of the foregoing, in some embodiments, the present disclosure provides systems and methods for dual ISH slide clinical interpretation, in which (i) cells to be scored are automatically selected, thereby mitigating the subjectivity of manually selecting cells; (ii) compared with the traditional method, the increased cell number can be considered when scoring, namely more than 20 cells can be analyzed; and (iii) may provide relevant feedback (e.g., visualization) to the pathologist for a more stable (and faster) analysis. Finally, the systems and methods of the present disclosure can enhance patient care and improve patient treatment outcomes.

At least some embodiments of the present disclosure relate to digital pathology systems and methods for analyzing image data captured from biological samples, including tissue samples, stained with one or more primary stains, such as hematoxylin and eosin (H & E), and one or more detection probes, such as probes containing specific binding entities that aid in labeling targets within the sample. Fig. 1 illustrates a digital pathology system 200 for imaging and analyzing a specimen, according to some embodiments. In some embodiments, the digital pathology system includes, for example, a digital data processing device (e.g., a computer including an interface for receiving image data from a slide scanner, a camera, a network, and/or a storage medium). In other embodiments, the digital pathology system 200 may include an imaging apparatus 12 (e.g., an apparatus having a microscope slide device for scanning a specimen bearing) and a computer 14, whereby the imaging apparatus 12 and computer may be communicatively coupled together (e.g., directly, or indirectly via a network 20). The computer system 14 may include a desktop computer, laptop computer, tablet computer or the like, digital electronic circuitry, firmware, hardware, memory, computer storage media, computer programs or sets of instructions (e.g., as stored in the memory or storage media), one or more processors (including programmed processors), and any other hardware, software, or firmware modules or combinations thereof. For example, the computing system 14 shown in FIG. 1 may include a computer having a display device 16 and a housing 18. The computer may store the digital image in binary form (stored locally, such as in memory, on a server, or on another network-connected device). The digital image may also be divided into a matrix of pixels. The pixel may comprise a digital value of one or more bits defined by a bit depth. One skilled in the art will recognize that other computer devices or systems may be utilized, and that the computer system described herein may be communicatively coupled with additional components (e.g., specimen analyzers, microscopes, other imaging systems, automated slide preparation devices, etc.). Some of these additional components are further described herein, as well as various available computers, networks, and the like.

In general, the imaging device 12 (or other image source including a pre-scanned image stored in memory or memories) may include, but is not limited to, one or more image capture devices. The image capture device may include, but is not limited to, a camera (e.g., an analog camera, a digital camera, etc.), optics (e.g., one or more lenses, a sensor focus lens group, a microscope objective, etc.), an imaging sensor (e.g., a Charge Coupled Device (CCD), a Complementary Metal Oxide Semiconductor (CMOS) image sensor, etc.), photographic film, etc. In a digital embodiment, the image capture device may include a plurality of lenses that may cooperate to demonstrate an instant focus function. An image sensor, such as a CCD sensor, may capture a digital image of the specimen. In some embodiments, the imaging device 12 is a bright field imaging system, a multispectral imaging (MSI) system, or a fluorescence microscopy system. The digitized tissue data may be generated, for example, by an image scanning system such as the VENTANA iScan HT scanner of VENTANA MEDICAL SYSTEMS, Inc. (Tucson, Arizona) or other suitable imaging device. Other imaging devices and systems are further described herein. Those skilled in the art will recognize that the digital color image acquired by the imaging device 12 may generally consist of primary color pixels. Each color pixel may be encoded on three digital components, each component containing the same number of bits, and each component corresponding to one of the primary colors, typically red, green or blue, also denoted by the term "RGB" component.

Fig. 2 outlines various modules utilized within the digital pathology system 200 of the present disclosure. In some embodiments, the digital pathology system 200 employs a computer device or computer-implemented method having one or more processors 220 and at least one memory 201, the at least one memory 201 storing non-transitory computer-readable instructions for execution by the one or more processors to cause the one or more processors (220) to execute instructions (or store data) in one or more modules (e.g., modules 202-210).

Referring also to fig. 2, in some embodiments, the system may include: (a) an imaging module 202 adapted to generate image data for staining a biological sample, such as a first image stained for the presence of one or more protein biomarkers and a second image stained for the presence of one or more nucleic acid biomarkers; (b) an unmixing module 203 for unmixing the acquired images with more than one stain into a single channel image; (c) a cell detection and classification module 204 for detecting and classifying stained cells, such as cells stained for nuclear or membrane protein biomarkers; (d) a scoring module 205 for assessing staining intensity and/or deriving an expression score; (e) a tissue region identification module 206 for identifying different tissue regions, such as tumor tissue regions; (f) an image registration module 207 for mapping regions in the first image to corresponding regions in the second image; (g) a spot detection module 208 for identifying a signal corresponding to one or more nucleic acid biomarkers; (h) a point classification module 209 for classifying the identified signals as signals corresponding to particular nucleic acid biomarkers; (i) a point count and classification module 210 for returning a total count of all detected and classified points within the cell or nucleus; and (j) a visualization module 211 for generating an overlay image or some graph (such as a histogram of groupings) based on the counted points and any data derived therefrom. Each of these modules will be described in greater detail herein.

Referring to fig. 2-5, the present disclosure provides a computer-implemented system and method for assessing whether cells within an image of a biological sample stained for the presence of one or more biomarkers have genetic aberrations, such as abnormally high copy numbers of genes, certain chromosomal abnormalities, and the like. In some embodiments, the system runs a plurality of modules (e.g., modules 204 and 205) to identify cells in the first image stained for the presence of one or more biomarkers that meet a predetermined level of expression of a protein biomarker, such as a predetermined minimum stain intensity level (step 311). For example, in the context of HER2 protein biomarkers, those stained cells that meet the lowest membrane staining intensity level are identified using modules 204 and 205. According to this example, and based on the lowest established threshold, cells meeting the lowest level of membrane staining intensity are likely to exhibit an abnormal HER2 gene status.

Subsequently, a tissue region (e.g., a tumor tissue region) in the first image encompassing the identified cells satisfying the predetermined protein biomarker expression level is identified (step 312; see also step 412) (e.g., with tissue region identification module 206). Continuing with the above example, in the context of HER2, an epithelial tumor tissue region encompassing the identified cells that meet the lowest membrane staining intensity threshold is derived. It is believed that in these identified regions of tumor tissue, cells with genetic aberrations are likely to be found.

In some embodiments, the identified tissue regions are mapped from the first image to corresponding regions in the second image (step 313; see also step 413) (e.g., using image registration module 207). For example, if the first image is a first continuous slice of tissue and the second image is a second continuous slice of tissue, the image registration technique allows structures, objects, or regions identified in the first image to be identified in the corresponding second image. In the above example, those identified epithelial tumor tissue regions are transferred from the first image to the second image after image registration is completed. In this way, the cells within the second image that belong to the identified region of epithelial tumor tissue can be analyzed for genetic aberrations.

Next, a plurality of modules (e.g., module 208, module 209, and module 210) are used to detect and/or quantify signals corresponding to one or more nucleic acid biomarkers within the mapped tissue region (step 314; see also step 414). In some embodiments, the signal corresponding to one or more nucleic acid biomarkers is a spot or a spot. The detected and/or quantified signals may then be used to assess that nuclei, such as tumor nuclei, in the mapped tissue region have genetic aberrations, such as high copy number or chromosomal abnormalities (step 315; see also step 415). In the context of the HER2 example, nuclei within an identified and registered epithelial tumor tissue region with, for example, normal gene copy and ploidy status (2 signals for HER2 and 2 signals for chromosome 17); HER2 amplification; chromosome 17 polysomy; and/or chromosome 17 polysomy and HER2 amplification.

The assessments performed (step 315; see also step 415) may then be visualized using the visualization module 211 or may be stored in the database 240 (e.g., the assessments of individual nuclei and their locations are stored; the assessments of the entire mapped tissue region are stored, etc.). For example, an overlay image may be generated and then overlaid onto the entire slice image or any portion thereof. In the context of HER2, and by way of example only, those cells having a ratio of black dots to red dots of greater than 2 may be visualized as having one color, while those cells having a ratio of less than or equal to 2 may be visualized as having a second, different color. Also, a grouping histogram of the calculated ratio for storage or output may be generated.

Those skilled in the art will recognize that additional modules or databases not depicted in FIG. 2 may be incorporated into the workflow. For example, the image pre-processing module may be operated to apply certain filters to the acquired images or to identify certain histological and/or morphological structures within the tissue sample. In addition, a target region selection module may be utilized to select a particular portion of an image to perform an analysis.

Image acquisition module

In some embodiments, referring to fig. 2, the digital pathology system 200 operates the imaging module 202 to capture images or image data (e.g., from the scanning device 12) of a biological sample having one or more stains (step 310; see also step 410). In some embodiments, the received or acquired image is an RGB image or a multispectral image (e.g., a multi-channel brightfield and/or darkfield image). In some embodiments, the captured image is stored in memory 201.

Images or image data (used interchangeably herein) may be acquired using the scanning device 12, for example, in real-time. In some embodiments, the image is acquired from a microscope or other instrument capable of capturing image data of a microscope slice bearing the specimen, as described herein. In some embodiments, the image is acquired using a two-dimensional scanner, such as a scanner capable of scanning image tiles, or a line scanner, such as an VENTANA DP200 scanner, capable of scanning the image line-by-line. Alternatively, the images may be images that have been previously acquired (e.g., scanned) and stored in the one or more memories 201 (or images retrieved from a server over the network 20).

In some embodiments, the received as input image is an entire slice image. In other embodiments, the received as input image is a portion of an entire slice image. In some embodiments, the entire slice image is decomposed into several portions, such as tiles, and each portion or tile may be analyzed separately (e.g., using the modules listed in fig. 2 and at least the methods shown in fig. 3 and 4). After analyzing the portions or patches individually, the data for each portion or patch may be stored individually and/or reported at the entire slide level.

The biological sample may be stained by applying one or more stains, and the resulting image or image data includes signals corresponding to each of the one or more stains. In some embodiments, the input image is a simplex image with only a single stain (e.g., stained with 3,3' -Diaminobenzidine (DAB)). In some embodiments, the biological sample may be stained in a multiplexed analysis of two or more stains (thereby providing a multiplexed image). In some embodiments, the biological sample is stained for at least two biomarkers. In other embodiments, the biological sample is stained for the presence of at least two biomarkers and is also stained with an initial stain (such as hematoxylin). In some embodiments, the biological sample is stained for the presence of at least one protein biomarker and at least two nucleic acid biomarkers (e.g., DNA, RNA, microRNA, etc.).

In some embodiments, the biological sample is stained with an immunohistochemistry assay for the presence of one or more protein biomarkers. For example, the biological sample may be stained for the presence of the human epidermal growth factor receptor 2 protein (HER2 protein). Currently, there are two Food and Drug Administration (FDA) approved methods for HER2 assessment in the united states. Hercepttest^TM(DAKO, Glostrup Demark) and HER2/neu (4B5) rabbit monoclonal primary antibodies (Ventana, Tucson, Arizona).

In other embodiments, the biological sample is stained for the presence of Estrogen Receptor (ER), Progestin Receptor (PR), or Ki-67. In other embodiments, the biological sample is stained for the presence of EGFR or HER 3. Zamyly et al, "Current and Prospectral Biomarkers of Long Cancer", cancers (Basel), 11 months 2018; 9(11) examples of other protein biomarkers are described, the disclosure of which is incorporated herein by reference in its entirety. Examples of protein biomarkers described by Zanay include CEACAM, CYFRA21-1, PKLK, VEGF, BRAF, and SCC.

In other embodiments, the biological sample is stained for the presence of one or more nucleic acids, including mRNA, in an In Situ Hybridization (ISH) assay. U.S. patent No. 7,087,379, the disclosure of which is incorporated herein by reference in its entirety, describes a method of staining a sample with an ISH probe so that a single spot (or dot) representing a single gene copy can be observed and detected. In some embodiments, multiple target genes are analyzed simultaneously by exposing a cell or tissue sample to multiple nucleic acid probes that have been labeled with multiple different nucleic acid tags.

For example, the INFORM HER2 dual ISH DNA probe cocktail assay from Ventana Medical Systems, inc. (Tucson, AZ) is intended to determine the status of the HER2 gene by calculating the ratio of the HER2 gene to chromosome 17. The HER2 and chromosome 17 probes were detected on formalin-fixed paraffin-embedded tissue samples such as human breast cancer tissue specimens or human gastric cancer tissue specimens using two-color chromogenic in situ hybridization. For the HER2 dual ISH measurement, the signals are silver ("black signal") and red signal, corresponding to black and red dots, respectively, in the input image. For the HER2 dual ISH assay, cell-based scoring involves counting red and black dots within the selected cells, wherein the HER2 gene expression is expressed by black dots and chromosome 17 is expressed by red dots.

In some embodiments, the biological sample is stained for at least a HER2 protein biomarker, and for HER2 and chromosome 17 nucleic acid biomarkers. In some embodiments, the biological sample is stained for at least a HER2 protein biomarker, for HER2 and chromosome 17 nucleic acid biomarkers, and for at least one additional protein biomarker (e.g., ER, PR, Ki-67, etc.). For example, a first serial tissue section may be stained for the HER2 protein biomarker (and optionally other protein biomarkers), and a second serial tissue section may be stained using a HER2 dual ISH probe cocktail (see, e.g., fig. 7A and 7B). In some embodiments, the biological sample is stained for at least an EGFR protein biomarker, and for an EGFR/CEP nucleic acid biomarker.

The chromogenic stain may include hematoxylin, eosin, fast red, or 3,3' -Diaminobenzidine (DAB). In some embodiments, the tissue sample is stained with an initial stain (e.g., hematoxylin). In some embodiments, the tissue sample is stained with a secondary stain (e.g., eosin). In some embodiments, the tissue sample is stained for a specific biomarker in an IHC assay. Of course, one skilled in the art will recognize that any biological sample may also be stained with one or more fluorophores.

A typical biological sample is processed in an automated staining/assay platform that stains the sample. There are a number of commercial products on the market that are suitable for use as staining/assay platforms, Discovery by Ventana Medical Systems, inc. (Tucson, AZ)^TMOne example is a product. The camera platform may also include a bright field microscope, such as a VENTANA iScan HT or VENTANA DP200 scanner from Ventana Medical Systems, Inc., or any microscope having one or more objective lenses and a digital imager. Other techniques may be used to capture images of different wavelengths. Further, camera platforms suitable for Imaging stained biological specimens are known in the art and are commercially available from companies such as Zeiss, Canon, Applied Spectral Imaging, and such platforms are readily adaptable for use in the systems, methods, and devices disclosed by the present subject matter.

In some embodiments, the input image is masked such that only tissue regions are present in the image. In some embodiments, a tissue region mask is generated to mask non-tissue regions from tissue regions. In some embodiments, a tissue region mask may be created by identifying the tissue region and automatically or semi-automatically (i.e., with minimal user input) excluding background regions (e.g., corresponding to the entire slice image region without sample glass, e.g., only regions where white light from the imaging source is present). One skilled in the art will recognize that in addition to masking non-tissue regions from tissue regions, the tissue masking module may also mask other target regions as desired, such as a portion of tissue identified as belonging to a certain tissue type or to a suspected tumor region. In some embodiments, a tissue region masking image is generated using segmentation techniques by masking a tissue region from non-tissue regions in the input image. Likewise, suitable segmentation techniques are also known in the art (see Digital Image Processing, third edition, Rafael c. gonzalez, Richard e. woods, chapter 10, page 689 and Handbook of Medical Imaging, Processing and Analysis, Isaac n. bankman Academic Press, 2000, chapter 2). In some embodiments, digitized tissue data in the image is differentiated from a slide using image segmentation techniques, the tissue corresponding to the foreground and the slide corresponding to the background. In some embodiments, the component calculates a region of interest (AOI) in the entire slice image to detect all tissue regions in the AOI while limiting the amount of background non-tissue regions analyzed. A variety of different image segmentation techniques (e.g., HSV color-based image segmentation, laboratory image segmentation, mean shift color image segmentation, region growing, level set methods, fast marching methods, etc.) may be used to determine the boundaries of, for example, tissue data and non-tissue or background data. Based at least in part on the segmentation technique, the component can also generate a tissue foreground mask that can be used to identify those portions of the digitized slide data that correspond to the tissue data. Alternatively, the component can generate a background mask that identifies those portions of the digitized slide data that do not correspond to the tissue data.

This identification may be achieved through image analysis operations (e.g., edge detection, etc.). The tissue region mask may be used to remove non-tissue background noise in the image (e.g., non-tissue regions). In some embodiments, the generation of the tissue region mask includes one or more of the following (but is not limited to the following): calculating a luminance of a low resolution analysis input image, generating a luminance image, applying a standard deviation filter to the luminance image, generating a filtered luminance image, and applying a threshold to the filtered luminance image, thereby setting a pixel having a luminance higher than a given threshold to 1 and a pixel lower than the threshold to 0, and generating the tissue region mask. Additional information and examples relating to the generation of Tissue region masks are disclosed in U.S. publication No. 2017/0154420 entitled "An Image Processing Method and System for Analyzing a Multi-Channel Image associated from a Biological Tissue Sample bed stage by Multi stages," the disclosure of which is incorporated herein by reference in its entirety.

Unmixing module

In some embodiments, the received image as input may be a multiplexed image, i.e., the received image is an image of a biological sample stained with more than one stain (e.g., an image stained for the presence of HER2 and chromosome 17 probes; an image stained for the presence of a protein biomarker or a nucleic acid biomarker). In these embodiments, the multi-path image is first unmixed into its constituent channels, e.g., with the unmixing module 203, before further processing, where each unmixed channel corresponds to a particular stain or signal.

In some embodiments, in a sample containing one or more stains, a single image may be generated for each channel containing one or more stains. Those skilled in the art will recognize that features extracted from these channels can be used to describe different biological structures (e.g., nuclei, membranes, cytoplasm, nucleic acids, etc.) present in any image of a tissue.

For example, in the context of the HER2 dual ISH probe described herein, unmixing will generate a first unmixed image channel image with a silver (or black) signal (corresponding to black dots), a second unmixed image channel image with a red signal (corresponding to red dots), and a third unmixed image channel image with a hematoxylin signal. Each of these unmixed black and red dot images can be used as an input to the point detection and classification module described herein. Likewise, by way of another non-limiting example, an input image stained for the presence of two protein biomarkers (e.g., HER2 and Ki-67) would be unmixed into a first image channel image (e.g., DAB membrane staining) with a signal corresponding to HER2 and a second image channel image with a signal corresponding to Ki-67. Likewise, the HER2 protein biomarker image and the Ki-67 protein biomarker image may be used as input images for the detection and classification of cells as described herein.

In some embodiments, the multispectral image provided by the imaging module 202 is a weighted mixture of the underlying spectral signals associated with a single biomarker and noise component. At any particular pixel, the blending weight is proportional to the marker expression of the basal co-localization biomarker at a particular location in the tissue and the background noise at that location. Therefore, the blending weight is different between different pixels. The spectral unmixing method disclosed herein decomposes a multi-channel pixel value vector into a set of constituent biomarker end members or components at each pixel and estimates the proportions of the individual constituent stains of each of the biomarkers.

Unmixing refers to the process of decomposing the measured spectrum of a mixed pixel into a set of constituent spectra or end-members representing the proportion of each end-member in the pixel, and a corresponding set of fractions or abundances. In particular, the unmixing process may extract stain-specific channels, such that the local concentration of individual stains may be determined using a reference spectrum that is well known for standard types of tissue and stain combinations. The unmixing may use a reference spectrum retrieved from the control image or estimated from the image under observation. Unmixing the component signals for each input pixel allows for retrieval and analysis of stain-specific channels, such as the hematoxylin and eosin channels in H & E images, or the Diaminobenzidine (DAB) and counterstain (e.g., hematoxylin) channels in IHC images. The terms "unmixing" and "color deconvolution" (or "deconvolution") or similar terms (such as "deconvolution", "unmixing") are used interchangeably in the art.

In some embodiments, the multi-channel image and unmixing module 205 unmixes in a linear unmixing manner. Linear Unmixing is described, for example, in "Zimmermann' Spectral Imaging and Linear Unmixing in Light Microscopy" (Spectral Imaging and Linear Unmixing in optical Microscopy) Adv Biochem Engin/Biotechnology (2005)95: 245-. In linear stain unmixing, the measured spectrum (S (λ)) at any pixel is considered to be a linear mixture of stain spectral components and is equal to the sum of the proportions or weights (a) of the color references (R (λ)) for each individual stain represented at that pixel.

S(λ)＝A₁·R₁(λ)+A₂·R₂(λ)+A₃·R₃(λ).......A_iry(λ)

More generally, it can be expressed in matrix form as

S(λ)＝ΣA_iry (λ) or S ═ R.A

For example, if there are M acquired channel images and N individual stains, then the columns of the M x N matrix R are the best color system derived herein, the N x 1 vector a is the unknown of the individual stain proportions, and the M x 1 vector S is the multi-channel spectral vector measured at the pixel. In these equations, the signal in each pixel (S) is measured during acquisition of the multiplexed images and the reference spectrum, i.e., the optimal color system, described herein is derived. By calculating various coloring agents (A)_i) The contribution of each point in the measured spectrum is determined. In some embodiments, an inverse least squares fitting method is used to solve that minimizes the squared difference between the measured and calculated spectra by solving the following system of equations.

In this equation, j represents the number of detection channels and i equals the number of stains. The solution of the linear equation typically allows for constrained solution mixing, forcing the weights (a) to add together.

In other embodiments, the unmixing is accomplished using the method described in WO2014/195193 entitled "Image Adaptive physical plant Color Separation" filed on 28.5.2014, the disclosure of which is incorporated herein by reference in its entirety. In general, WO2014/195193 describes a unmixing method for separating component signals of the input image by using iteratively optimized reference vectors. In some embodiments, the image data in an assay is correlated with expected or ideal results of features specific to the assay to determine a quality metric. In case of low image quality or poor correlation compared to the ideal result, one or more reference column vectors in the matrix R are adjusted and the unmixing is iteratively repeated with the adjusted reference vectors until the correlation shows a high quality image satisfying physiological and anatomical requirements. The anatomical, physiological, and measurement information may be used to define rules that are applied to the measured image data to determine a quality metric. This information includes how the tissue is stained, which structures within the tissue are or are not intended to be stained, and the relationship between the structures, stains, and markers specific to the assay to be treated. The iterative process produces stain-specific vectors that can generate images that accurately identify target structures and biologically relevant information, and that are free of any noise or unwanted spectra, and thus are suitable for analysis. The reference vector is adjusted into a search space. The search space defines a range of values over which the reference vector may represent a stain. The determination of the search space may be accomplished by scanning various representative training assays, including known or commonly occurring problems, and determining a high quality set of reference vectors for the training assays.

Protein biomarker detection

After acquiring one or more images from, for example, successive tissue slices using the imaging module 202 (step 310; see also step 410), and optionally unmixing using the unmixing module 203, cells satisfying predetermined criteria, such as a threshold protein biomarker expression level, are identified in the acquired images (or unmixed image channel images) (step 311; see also step 411). In some embodiments, the cell detection and classification module 204 can be used to detect and classify cells stained for the presence of protein biomarkers. After detection and classification, the scoring module 205 may be used, for example, to derive a level of staining intensity or expression level. The identified cells that meet the predetermined criteria are then used to identify a tissue region, such as a tumor tissue region.

For example, and in the context of a sample stained for the presence of the HER2 protein biomarker, cell membranes expressing HER2 protein may be detected, classified, and/or scored (see, e.g., fig. 8B, 9A, and 9B). In some embodiments, each detected and classified cell can then be evaluated, such that the level of staining intensity can be determined. In other embodiments, membrane staining may be assessed and scored by an automated scoring algorithm (e.g., 0, +1, +2, or + 3). For example, those detected cells that meet a minimum threshold staining intensity or score can be identified and used to determine a tumor tissue region (see fig. 8 and 9).

Automated cell detection and classification module

After image acquisition and/or unmixing, the input images or unmixed image channel images are provided to the cell detection and classification module 204 for automatic detection, identification, and/or classification of cells and/or nuclei. The programs and automated algorithms described herein may be adapted to identify and classify various types of cells or nuclei based on features within the input image, including identifying and classifying tumor cells, non-tumor cells, stromal cells, and lymphocytes.

One skilled in the art will recognize that the cell nucleus, cytoplasm, and cell membrane have different characteristics, and that differently stained tissue samples may exhibit different biological characteristics. Indeed, one skilled in the art will recognize that certain cell surface receptors may have a staining pattern that localizes to the cell membrane or cytoplasm. Thus, the "cell membrane" staining pattern is analytically different from the "cytoplasmic" staining pattern. Likewise, the "cytoplasmic" staining pattern is analytically different from the "nuclear" staining pattern. Each of these different staining patterns may be used as features to identify cells and/or nuclei.

U.S. patent No. 7,760,927, the disclosure of which is incorporated herein by reference in its entirety, describes a method of identifying, classifying and/or scoring cell nuclei, cell membranes and cytoplasm in images of biological samples having one or more stains. For example, US 7,760,927 describes an automated method for simultaneously identifying a plurality of pixels in an input image of biological tissue stained with a biomarker, comprising considering a first color plane of a plurality of pixels in the foreground of the input image, thereby simultaneously identifying cytoplasmic and cellular membrane pixels, wherein the input image has been processed to remove its background portion and counterstaining components; determining a threshold level between cytoplasmic and cell membrane pixels in the digital image foreground; and simultaneously determining whether the selected pixel is a cytoplasmic pixel, a cell membrane pixel, or a transition pixel in the digital image by the selected pixel in the foreground and its eight neighboring pixels using the determined threshold level.

Suitable systems and methods for automatically identifying biomarker positive cells in images of biological samples are also described in U.S. patent publication No. 2017/0103521, the disclosure of which is incorporated herein by reference in its entirety. For example, US2017/0103521 describes (i) reading into one or more memories a first digital image and a second digital image depicting the same region of a first slide comprising a plurality of tumor cells that have been stained by a first stain and a second stain; (ii) identifying a plurality of nuclei and positional information of the nuclei by analyzing light intensity in the first digital image; (iii) identifying a cell membrane comprising the biomarker by analyzing light intensity in the second digital image and analyzing location information of the identified cell nuclei; and (iv) identifying biomarker positive tumor cells in the region, wherein the biomarker positive tumor cells are a combination of an identified nucleus and an identified cell membrane surrounding the identified nucleus. Also disclosed within US2017/0103521 are methods of detecting staining with HER2 protein biomarkers or EGFR protein biomarkers.

In some embodiments, tumor nuclei are automatically identified first by identifying candidate nuclei, and then automatically distinguishing between tumor nuclei and non-tumor nuclei. Various methods of identifying candidate nuclei in tissue images are known in the art. Candidate nuclei are automatically detected, for example, by applying a method based on radial symmetry, such as detection on a hematoxylin Image channel or a biomarker Image channel after unmixing (see Parvin, Bahram et al, "Iterative voting for the purpose of introducing structural significance and characterization of subcellular events") Image Processing, IEEE Transactions on16.3(2007):615 and 623, the disclosure of which is incorporated herein by reference in its entirety).

More specifically, in some embodiments, the received image as input is processed, for example to detect the nucleus center (seed) and/or segment the nucleus. For example, instructions may be provided to detect the center of the nucleus based on radial symmetry voting using the technique of Parvin (described above). In some embodiments, nuclei are detected using radial symmetry to detect the center of the nuclei, and then the nuclei are classified based on the intensity of staining around the center of the cell. In some embodiments, the radial symmetry-based nuclear detection operation is performed as described in commonly assigned and co-pending patent application WO/2014/140085A1, which is incorporated herein by reference in its entirety. For example, the image size may be calculated within the image and one or more votes at each pixel may be accumulated by adding the sum of the sizes within the selected region. Mean shift clustering can be used to find the local center of the region, which represents the actual nuclear location. Cell nucleus detection based on radial symmetry voting can be performed on color image intensity data and explicitly uses a priori domain knowledge that the cell nuclei are elliptically shaped blobs of varying size and eccentricity. To do this, image gradient information is used for radial symmetry voting in addition to the color intensities in the input image, and combined with an adaptive segmentation process to accurately detect and locate nuclei. For example, "gradient" as used herein refers to an intensity gradient of a particular pixel calculated taking into account intensity value gradients of a set of pixels surrounding the particular pixel. Each gradient may have a particular "direction" relative to a coordinate system whose x-axis and y-axis are defined by two orthogonal edges of the digital image. For example, detection of nuclear seeds involves defining the seed as a point that is assumed to be located within the nucleus and serves as a starting point for locating the nucleus. The first step is to detect the seed points associated with each nucleus, and hence the elliptically shaped spot structure similar to the nucleus, using a very stable method based on radial symmetry. In the radial symmetry approach, the gradient image may be processed using a kernel-based voting program. Each pixel that has accumulated votes through the voting kernel is processed, thereby creating a voting response matrix. The kernel is based on the gradient direction calculated at that particular pixel, the expected range of minimum and maximum nucleus sizes, and the voting kernel angle (typically in the range of [ pi/4, pi/8 ]). In the resulting voting space, the local maximum locations with voting values above a predetermined threshold are saved as seed points. In a subsequent segmentation or classification process, the unassociated seeds are discarded. Other methods are discussed in U.S. patent publication No. 2017/0140246, the disclosure of which is incorporated herein by reference in its entirety.

Nuclei can be identified using other techniques known to those of ordinary skill in the art. For example, the image size may be calculated from a particular image channel of one of the H & E or IHC images, and pixels around each specified size may be assigned a number of votes based on the sum of the sizes within the area around the pixel. Alternatively, a mean-shift clustering operation may also be performed to locate local centers within the voted image that represent the actual locations of the nuclei. In other embodiments, nuclear segmentation may be used to segment the entire nucleus by morphological operations and local thresholding based on the currently known nucleus center. In other embodiments, the nuclei may be detected using model-based segmentation (i.e., learning a shape model of the nuclei from a training data set and segmenting the nuclei in the test image as a priori knowledge).

In some embodiments, the nuclei are then segmented using a threshold value calculated separately for each nucleus. For example, the method of Otsu may be used to perform segmentation operations in regions around the identified nucleus, since it is believed that the pixel intensities in the nucleus region may vary. As will be appreciated by those of ordinary skill in the art, Otsu's method is used to determine the optimal threshold by minimizing the intra-class variance, and is known to those of ordinary skill in the art. More specifically, Otsu's method is used to automatically perform cluster-based image thresholding, or to restore a grayscale image to a binary image. The algorithm assumes that the image contains two classes of pixels (foreground pixels and background pixels) that follow a bimodal histogram. An optimal threshold is then calculated that separates the two classes of pixels, which achieves a minimum or equal combined diffusion (intra-class variance) (since the sum of the pair-wise squared distances is constant), and thus a maximum of their inter-class variance.

In some embodiments, the systems and methods further include automatically analyzing spectral and/or shape features of the identified nuclei in the images to identify nuclei of non-tumor cells. For example, blobs may be identified in the first digital image of the first step. As used herein, a "blob" may be, for example, a region of a digital image in which some property (such as intensity or gray value) remains constant or varies within a specified range of values. In a sense, all pixels in a blob can be considered similar to each other. For example, the blobs may be identified using a differentiation method based on the derivative of the position function on the digital image and a local extremum-based method. A nuclear blob is a blob whose pixel and/or outline shape indicates that it may be produced by a nucleus stained with a first stain. For example, the radial symmetry of a spot may be evaluated to determine whether the spot should be identified as a nuclear spot or any other structure, such as a staining artifact. For example, in the case of a spot that is elongated and does not have radial symmetry, the spot may not be identified as a cell nucleus spot, but as a staining artifact. According to an embodiment, a blob identified as a "nuclear blob" may represent a set of pixels identified as candidate nuclei and may be further analyzed to determine whether the nuclear blob represents a nucleus. In some embodiments, any kind of nuclear spot is used directly as "identified nucleus". In some embodiments, the identified nuclei or nuclear blobs are subjected to a filtering operation to identify nuclei that do not belong to biomarker positive tumor cells and to remove the identified non-tumor nuclei from the list of identified nuclei or to not add the nuclei to the list of identified nuclei from the beginning. For example, additional spectral and/or shape characteristics of the identified nuclear blobs may be analyzed to determine whether the nuclei or nuclear blobs are nuclei of tumor cells. For example, lymphocytes have a nucleus that is larger than the nucleus of other tissue cells (e.g., lung cells). In the case where the tumor cells are derived from lung tissue, the nuclei of lymphocytes are identified by identifying all nuclear blobs whose smallest size or diameter is significantly larger than the average size or diameter of nuclei of normal lung cells. Identified nuclear blobs associated with lymphoid nuclei may be removed from the collection of identified nuclei (i.e., "filtered"). By filtering the nuclei of non-tumor cells, the accuracy of the method can be improved. Since non-tumor cells may also express the biomarker to some extent depending on the biomarker, an intensity signal that is not derived from tumor cells may be generated in the first digital image. By identifying and filtering nuclei that do not belong to tumor cells from the total number of nuclei identified, the accuracy of identifying biomarker positive tumor cells can be improved. These and other methods are described in U.S. patent publication 2017/0103521, the disclosure of which is incorporated herein by reference in its entirety. In some embodiments, once a seed is detected, a local adaptive thresholding method may be used and a spot is created around the center of detection. In some embodiments, other methods may also be introduced, for example a marker-based watershed algorithm may also be used to identify nuclear blobs around the detected nuclear center. PCT publication No. WO2016/120442 describes these and other methods, the disclosure of which is incorporated herein by reference in its entirety.

Upon detecting the cell nucleus, features (or metrics) are derived from the input image. Deriving metrics from nuclear features is well known in the art, and any known nuclear feature may be used in the context of the present disclosure. Non-limiting examples of metrics that can be calculated include:

(A) metrics derived from morphological features

For example, "morphological feature" as used herein is a feature that indicates the shape or size of a nucleus. Without wishing to be bound by any particular theory, it is believed that morphological features provide some important information about the size and shape of the cell or its nucleus. For example, morphological features may be computed by applying various image analysis algorithms to pixels contained in or around the nuclear blob or seed. In some embodiments, the morphological features include area, minor and major axis length, perimeter, radius, volume, and the like. At the cellular level, such features are used to classify the nucleus into either a healthy cell class or a diseased cell class. At the tissue level, these statistical features are exploited histologically to classify tissue as diseased or non-diseased.

(B) Metrics derived from color

In some embodiments, the metric derived from color comprises a color ratio, R/(R + G + B), or a principal component of color. In other embodiments, the color-derived metric comprises local statistics (mean/median/variance/standard deviation) and/or color intensity correlation for each color in the local image window.

Between the black and white shades of gray cells represented in the histopathological section image, neighboring cell groups having certain specific attribute values are set. In this way the intensity of these colored cells can determine the affected cells from their surrounding clusters of dark cells, since the correlation of the color features defines an example of size grading.

(D) Metrics derived from spatial features

In some embodiments, the spatial features comprise a local density of cells; the average distance between two adjacent detected cells; and/or the distance from the cell to the segmented region.

Of course, other features known to those of ordinary skill in the art may also be considered and used as a basis for feature calculation.

In some embodiments, the cell detection and classification module 204 is run more than once. For example, the cell detection and classification module 204 operates for the first time to extract features and classify cells and/or nuclei in the first image; a second pass is then performed to extract features and classify cells and/or nuclei in a series of additional images, which may be other simplex images or unmixed image channel images, or any combination thereof.

After deriving the features, the features may be used alone or with training data (e.g., during training, example cells are presented with basic truth recognition provided by an expert observer according to procedures known to those of ordinary skill in the art) to classify nuclei or cells. In some embodiments, the system may include a classifier that is trained based at least in part on a set of training or reference slides for each biomarker. One skilled in the art will recognize that different sets of slides may be used to train the classifier for each biomarker. Accordingly, for a single biomarker, a single classifier is obtained after training. One skilled in the art will also recognize that because of the differences between the image data obtained from different biomarkers, different classifiers can be trained for each different biomarker, thereby ensuring better performance on unseen test data, where the biomarker-type test data is known. The classifier trained for the slide specification may be selected based at least in part on how best to handle variability in training data, such as tissue type, staining protocols, and other target features.

Grading module

In some embodiments, the scoring module 205 utilizes data collected during cell detection and classification. For example, as described herein, the cell detection and classification module 204 may include a series of image analysis algorithms and may be used to determine whether one or more nuclei, cell walls, tumor cells, or other structures are present within the identified cell clusters. In some embodiments, the derived staining intensity values and counts of specific nuclei per field can be used to determine various marker expression scores, such as percent positive or H-Score scores. Suitable scoring methods are described in U.S. patent publication No. 2017/0103521, the disclosure of which is incorporated herein by reference in its entirety.

For example, automated image analysis algorithms in the cell detection and classification module 204 may be used to interpret each IHC slide in the series to detect tumor cell nuclei that stain positively and negatively for a particular biomarker (e.g., Ki67, ER, PR, HER2, etc.). Based on the detected positive and negative tumor nuclei, various slide grade scores, such as percent marker positivity, H-Score scores, and the like, can be calculated using one or more methods.

In some embodiments, the expression Score is a H-Score. In some embodiments, the "H" score is used to assess the percentage of tumor cells with a "weak", "medium", or "strong" cell membrane staining scale. The grades were added with a total score of up to 300 points and a cut point to distinguish "positive" from "negative" of 100 points. For example, the intensity of membrane staining (0, 1+, 2+, or 3+) is determined for each cell in the fixed field of view (or here each cell in the tumor or cell cluster). The H-score may simply be based on one of the dominant staining intensities, or more complex, may comprise the sum of the individual H-score scores for each seen intensity level. By one method, the percentage of cells at each staining intensity level is calculated and finally, an H-score is assigned using the following formula. [1x (% cell 1+) +2x (% cell 2+) +3x (% cell 3+) ]. A final score of between 0 and about 300 provides more relative weight to higher intensity membrane staining in a given tumor sample. The sample may then be considered positive or negative based on a particular discrimination threshold. Additional methods of calculating the H-score are described in U.S. patent publication No. 2015/0347702, the disclosure of which is incorporated herein by reference in its entirety.

In some embodiments, the expression score is an Allred score. The Allred score is a scoring system that indicates the percentage of cells that are positive for the hormone receptor test and the extent to which the receptor appears after staining (referred to as "intensity"). The sample will then be scored from 0 to 8 points in conjunction with this information. It is believed that the higher the score, the more receptors and the easier it is to see in the sample.

In other embodiments, the expression score is a percentage of positive. Also, the percent positivity (e.g., the total number of nuclei of cells positive (e.g., malignant cells) in each field of the digital image of the slide after staining added and then divided by the total number of nuclei positive and negative staining in each field of the digital image) was calculated in a single slide against the background of scoring breast cancer samples stained for PR and Ki-67 biomarkers as follows: percent positive is the number of cells staining positive/(number of cells staining positive + number of cells staining negative).

In other embodiments, the expression score is an immunohistochemical combined score that is a prognostic score based on several IHC markers, wherein the number of markers is greater than 1. These joint scores are described in U.S. patent publication No. 2017/0082627, the disclosure of which is incorporated herein by reference in its entirety.

Tissue region identification module

After identifying cells that meet a threshold protein biomarker expression level (step 311; see also step 411), a tissue region (e.g., a tumor tissue region) encompassing the identified cells is derived (step 312; see also step 412), for example, by using the tissue region identification module 206. For example, upon identifying a single cell with minimal HER2 cell membrane staining intensity, a tumor tissue region encompassing the identified cell can be derived.

Identification of Tissue type is performed according to the method described in PCT publication No. WO2015/113895, filed on 23.1.2015 for "Adaptive Classification for white Slide Tissue Segmentation", the disclosure of which is incorporated herein by reference in its entirety. In general, PCT publication No. WO2015/113895 describes segmenting the tumor region from other regions in an image by operations related to region classification including identifying grid points in the tissue image, classifying the grid points into one of a plurality of tissue types, and generating classified grid points based on a database of known tissue type features; assigning at least one high confidence score and low confidence score to the classified grid points, modifying a database of known tissue type features based on the high confidence score assigned grid points, and generating a modified database; the mesh points assigned a low confidence score are reclassified based on the modified database and the tissue is then segmented (e.g., tissue regions in an identified image).

Alternatively, or in addition, automated image analysis operations such as segmentation, thresholding, edge detection, etc. may be used, as well as automated detection of tumor regions or other regions based on the automatically generated FOV of the detected region.

In some embodiments, a tissue region mask may be derived. U.S. patent application publication No. 2017/0154420, the disclosure of which is incorporated herein by reference in its entirety, describes a method of generating such a tissue region.

Automated image registration module

After identifying a tissue region in a first image, e.g., an image having a signal corresponding to one or more protein biomarkers (step 312; see also step 412), the identified tissue region is mapped to a second image, e.g., an image having a signal corresponding to one or more nucleic acid biomarkers (step 313; see also step 413). Mapping of identified tissue regions is particularly useful where sequential tissue sections are utilized, such as a first sequential section stained for the presence of one or more protein biomarkers and a second sequential section stained for the presence of one or more nucleic acid biomarkers. In this way, the mapping process is able to identify corresponding structures, cells and tissues in each successive slice despite the differences between successive slices of tissue.

Generally, registration involves selecting one input image or a portion thereof (e.g., a cell cluster) as a reference image and computing the transformation of each other input image into the reference image coordinate system. Thus, all input images may be aligned to the same coordinate system by image registration (as the reference coordinates may be a slice portion in the middle of a tissue block or a slice with a specific marker in case of consecutive tissue slices). Thus, each image can be registered from its old coordinate system to the new reference coordinate system.

Registration is the process of converting different data sets (here images or clusters of cells within an image) into a coordinate system. More specifically, registration is the process of aligning two or more images, and generally involves designating one image as a reference (also referred to as a reference image or a fixed image) and geometrically transforming the other images to align the images with the reference. The geometric transformation maps a location in one image to a new location in another image. The step of determining the correct geometric transformation parameters is the key to the image registration process. Methods of computing the transformation of each image to a reference image are well known to those skilled in the art. For example, an image registration algorithm is described in the 11th International Biomedical Imaging workshop (ISBI) (11th International Symposium on Biomedical Imaging (ISBI) "(2014 IEEE, 2014 4-29-2014 5-2), the disclosure of which is hereby incorporated by reference in its entirety

Any registration method may be used in the systems and methods disclosed herein. In some embodiments, Image Registration is performed using a method described in WO/2015/049233 entitled "Line-Based Image Registration and Cross-Image interpretation Devices, Systems and Methods," filed on 9, 30, 2014, the disclosure of which is hereby incorporated by reference in its entirety. WO/2015/049233 describes a registration process which includes a coarse registration process used alone or in combination with a fine registration process. In some embodiments, the coarse registration process may include selecting digital images for alignment, generating foreground image masks from each of the selected digital images, and matching tissue structures between the foreground images thus generated. In other embodiments, generating the foreground image mask includes generating a soft weighted foreground image from an entire slice image of the stained tissue slice and OTSU thresholding the soft weighted foreground image to produce a binary soft weighted image mask. In a further embodiment, generating the foreground image mask includes generating a binary soft weighted image mask from an entire slice image of the stained tissue slice, generating a gradient magnitude image mask from the same entire slice image, respectively, OTSU thresholding the gradient image mask to generate a binary gradient magnitude image mask, and merging the binary soft weighted image and binary gradient magnitude image mask by a binary OR operation to generate the foreground image mask. The term "gradient" as used herein refers to an intensity gradient of a particular pixel calculated taking into account intensity value gradients of a set of pixels surrounding the particular pixel. Each gradient may have a particular "direction" relative to a coordinate system whose x-axis and y-axis are defined by two orthogonal edges of the digital image. The "gradient direction feature" may be a data value indicating a gradient direction within the coordinate system. In some embodiments, matching the tissue structure includes computing line-based features from the boundaries of each of the resulting foreground image masks, computing a global transformation parameter between a first set of line features on a first foreground image mask and a second set of line features on a second foreground image mask, and globally aligning the first and second images based on the transformation parameter. In yet another embodiment, the coarse registration process includes mapping the selected digital images to a common grid, which may encompass the selected digital images, based on global transformation parameters. In some embodiments, the precision registration process may include identifying a first sub-region of a first digital image in the aligned digital image set; identifying a second sub-region on a second digital image in the aligned digital image set, wherein the second sub-region is larger than the first sub-region and the first sub-region is substantially within the second sub-region on the common grid; and calculating an optimized position of the first sub-region in the second sub-region.

These methods are illustrated in fig. 6 herein, where the method 600 begins at a start block 602. At block 604, a set of image data or digital images (e.g., scanned or selected from a database) is acquired for operation. Each set of image data includes image data corresponding to a tissue slice, e.g., a set of adjacent tissue slices from a single patient. At block 606, if only a single image pair is selected, proceed directly to block 610. If more than one pair of images is selected, the selected set of images is grouped into pairs at block 608 before proceeding to block 610. In some embodiments, image pairs are selected as neighboring pairs. Thus, for example, if the selected image set includes 10 parallel, adjacent slices (LI... LI 0), then LI and L2 are grouped as a pair, L3 and L4 are grouped as a pair, and so on. On the other hand, if there is no information about which image pairs are most similar to each other, in some embodiments, the images are grouped according to their distance apart (e.g., edge or inter-image distance corresponding to the chamfer distance between the various image edge maps), and the images that are closest to each other are paired together. In an exemplary embodiment of the present disclosure, image pairing is performed using an inter-edge/inter-image distance. In some embodiments, the inter-image/inter-edge distance may be calculated using edge-based chamfer distances. If the image pair has previously passed through the coarse registration process such that the images have been roughly aligned and the results saved, the process proceeds to block 614. Otherwise, a coarse registration process is performed on the selected image pair at block 612. The coarse registration process will be described in further detail below.

Turning to block 614, the currently selected registered (aligned) images are displayed on a common grid, the images are superimposed in a single image and displayed as separate images, or both are displayed together on one display or distributed over several displays. At block 616, the client user may select one image from a pair of images as the source image. If the source image has been marked as desired, the process passes to block 622. Otherwise, the client user annotates the source image as required at block 620. At block 622, which may or may not occur substantially simultaneously with block 620, the annotation is mapped to the other image (the target image) of the pair and graphically rendered on the target image. In embodiments where the annotation occurs prior to the coarse registration, the annotation can be mapped from the source image to the target image at substantially the same time as the registered image pair (alignment). At block 624, the user may select whether to perform a precision registration process. If the user chooses not to perform precision registration but to directly display the results, the process passes to block 626.

Otherwise, a fine registration process is performed on the selected image pair at block 624, e.g., to optimize the position and/or image alignment of the mapped annotations. The precision registration process will be discussed in further detail below. At block 626, the annotated image pair is displayed with the results of the fine registration process (or if no fine registration is performed, the annotated image pair may be displayed with the results of the coarse registration process only). The method then ends finally at block 628.

Automated nucleic acid biomarker detection

After mapping the identified tissue region from the first image to the second image (step 313; see also step 413), the signals (or points) corresponding to one or more nucleic acid biomarkers can be identified in the mapped tissue region in the second image using the point detection module 208 and the point classification module 209 (step 314; see also step 414). The point count and classification module 210 may then be used to interpret the identified signals to identify genetic aberrations in the nucleus for assessment (step 315; see also step 415).

In some embodiments, modules 208, 209, and 210 are configured to detect a signal corresponding to at least one nucleic acid biomarker in each mapped tissue region of the second image. Based on the detection of the signal corresponding to the at least one nucleic acid biomarker, it may be assessed whether cells or nuclei within the mapped tissue region have a genetic aberration (e.g., abnormally high copy number; chromosomal abnormality). In some embodiments, the genetic aberration of each cell nucleus is assessed by determining whether a total number of identified points in the cell nucleus corresponding to one or more signals from at least one nucleic acid biomarker meets a predetermined threshold. For example, in the case of a biological sample stained with a single nucleic acid probe, the spots corresponding to the signals from the single nucleic acid probe can be detected, classified, and then counted. The number of copies present in each nucleus can then be compared to a predetermined threshold.

In other embodiments, the genetic aberration of the nucleus is assessed by: calculating a ratio of a first identified point corresponding to the first nucleic acid biomarker to a second identified point corresponding to the second nucleic acid biomarker; and comparing the calculated ratio for each cell nucleus to a predetermined threshold. In some embodiments, a point is identified for each cell nucleus, for example, a point corresponding to each different signal type in each cell nucleus is identified. For example, in the case of staining a biological sample with two nucleic acid probes, the spots corresponding to the signal from each probe (e.g., black or red spots in the context of staining for HER2 and chromosome 17) can be detected and classified, and a ratio can be calculated based on the total number of spots corresponding to each probe, which is ultimately used to determine whether there is a genetic aberration within the nucleus (see fig. 10 and 11).

By way of another example, a biological sample can be stained with an EGFR/CEP 7 dual probe and the ratio of EGFR gene to chromosome 7 can be calculated and compared to a clinically relevant threshold. By comparison, the following can be observed: disomy (score of 1), low trisomy (score of 2), high trisomy (score of 3), low polysomy (score of 4), high polysomy (score of 5), and amplification (score of 6) (see Dahle-Smith, "use of Epidermal Growth Factor (EGFR) copy number aberrations in esophageal and gastroesophageal junction cancers", Mol cytogene, 2015; 8:78, the disclosures of which are herein incorporated by reference in their entirety.

Further non-limiting examples of spot detection, spot classification and spot counting in the context of the presence or absence of HER2 and chromosome 17 by staining are described herein and illustrated in fig. 10. Also, the systems and methods described herein are not limited to the use of a dual ISH assay for HER2 to determine gene aberrations, rather, such an example is for illustrative purposes only.

Point detection module

In general, point detection is performed with the point detection module 208 to identify all point pixels in the input image (e.g., see FIG. 11). The detected points are then provided to a point classification module 209 for further processing and analysis.

In some embodiments, point detection may be performed using one or more different derived image features including, but not limited to, absorbance, multi-scale difference of gaussians (DoG), and features from unmixed image channels obtained after color deconvolution. The point detection according to the method of the present disclosure supports accurate and stable point detection by considering a plurality of features such as the above-described features. In some embodiments, methods are employed that are capable of specifically detecting signals corresponding to black and/or red spots, such as the methods used in the HER2 dual ISH assay. One skilled in the art will also recognize that while certain examples herein may refer to biomarkers labeled with chromosomes (e.g., black and red spots of HER2 dual ISH assay), dot detection may be performed with biomarkers labeled with fluorophores (e.g., FISH).

In some embodiments, point detection is performed using the method described in U.S. patent publication No. 2014/0377753, the disclosure of which is incorporated herein by reference in its entirety. For example, dots may first be identified by converting a color image of the cell to a monochrome image. In one embodiment, the monochrome image is first created by converting the color space of the color image of the cell from the RGB color space to the L a b color space. In L a b color space, the "L" channel represents the luminance of the pixel, the "a" channel reflects the red and green components of the pixel, and the "b" channel represents the blue and yellow components of the pixel. Then, a new image is created, which emphasizes the red and black colors in the image obtained by linearly combining the "L", "a", and "b" values at each pixel position. In some embodiments, the dots in the red and black enhanced images are detected by passing the enhanced images through some filters.

In some embodiments, the filters are difference of gaussian ("DoG") filters, where the size of each filter is selected based on the expected size of the points/blobs of points to be detected. In general, gaussian differencing is a feature enhancement algorithm that involves subtracting one blurred version of the original image from another less blurred version. In the simple case of a grayscale image, the blurred image is obtained by convolving the original grayscale image with gaussian kernels having different standard deviations. It is believed that blurring the image using a gaussian kernel can only suppress high frequency spatial information. Subtracting one image from the other preserves the spatial information between the frequency ranges maintained in the two blurred images. The difference of gaussians thus behaves as a band-pass filter that removes all but a few of the spatial frequencies that were retained in the original grayscale image. In some embodiments, the DoG filter has a size ranging from about 0.05 microns to about 5 microns. In some embodiments, the results of each pass through the DoG filter are combined to create a gray scale image that can be used as a mask after filtering to represent stained nuclear material and some "waste material" within each cell. U.S. patent application publication No. 2017/0337695, the disclosure of which is incorporated herein by reference in its entirety, describes a method of generating such a mask. The combined gray scale image may then be binarized using an adaptive thresholding technique, e.g., based on the Otsu method, to produce a point mask image, where each of the outside points has one binary value (e.g., logic 0) and each of the inside points has the opposite binary value (e.g., logic 1).

In other embodiments, the detection of the first and second points representing in situ hybridization signals of different colors includes generating a first color channel image and a second color channel image by color deconvolution (e.g., using the unmixing module 203) of the digital image, the first color channel image corresponding to the colored spectral contribution of the first stain and the second color channel image corresponding to the colored spectral contribution of the second stain; calculating at least one DoG image from the digital image by applying a pair of gaussian filters having different standard deviations of a kernel to the digital image and by subtracting two filtered images output from the gaussian filters from each other, the DoG image being a gaussian difference image; calculating an absorbance image from the image of the tissue sample; detecting adjacent pixel sets of which the absorbance values of the absorbance images in the digital images exceed an absorbance threshold and the DoG values of the DoG images exceed a DoG threshold, and taking the detected adjacent pixel sets as expected points; identifying an expected point in the first color channel image having an intensity value exceeding a first color intensity threshold and outputting the identified expected point as the detected first point; and identifying an expected point in the second color channel image having an intensity value exceeding a second color intensity threshold and outputting the identified expected point as the detected second point.

In other embodiments, the spot detection is performed using the method described in U.S. patent publication No. 2017/0323148, the disclosure of which is incorporated by reference herein in its entirety, in the context of staining a sample with a HER2 dual ISH assay. According to the method described in the' 148 publication, point detection is achieved by computing pixels based on absorbance image features, difference of gaussian (DoG) image features, and unmixed color image channel features, where the pixels are computed by evaluating whether certain features within the image satisfy predetermined threshold criteria. In some embodiments, the threshold criteria are empirically derived after a number of experiments. In this case, pixels are calculated based on features that satisfy a predetermined threshold criterion, such as point pixels where both the difference of gaussians (DoG) and the absorbance are sufficiently high; black dot pixels with sufficiently high black unmixed image intensity and DoG; red unmixed image intensity and red dot pixels with a sufficiently high DoG. In some embodiments, a plurality of subsets of pixels are computed, each subset satisfying a different threshold criterion, such that the final subset of pixels is strong across all threshold criteria. As is known to those skilled in the art, after the final subset of pixels is computed, a "fill hole" operation is performed.

Those skilled in the art will recognize that different criteria may be established for different ISH probes, assays and protocols to accommodate different signals (e.g., different colors) in the image and to optimize spot detection for that particular ISH protocol accordingly.

Point classification module

After point detection by the point detection module 208, the system 200 operates the point classification module 209 to assign a color value to all point pixels (e.g., a black or red color value against a HER2 dual ISH background).

In some embodiments, point classification is accomplished using the method described in U.S. patent publication No. 2014/0377753, the disclosure of which is incorporated herein by reference in its entirety. According to the method of the' 753 publication, once a point mask image is created (as described above), it is used in conjunction with a classifier (e.g., in the cell detection and classification module 204) to remove any points associated with "garbage" and retain points representing signals corresponding to the target stain (e.g., black and red signals against the HER2 dual ISH assay background). The' 753 publication describes that, in some embodiments, a linear binary classifier can be utilized. In a first stage of using the classifier, the computer system executes instructions to delete points where the DoG response is weak based on the determined response histogram for each cell analyzed. In a second phase, separating coarse red dots from light red dots and black dots from dark blue dots at pixel locations corresponding to areas within each dot of the dot mask image by analyzing the color of the RGB image of the tissue. In this way a set of dots comprising only red and black is obtained. The remaining points are considered "garbage" and removed. The spots and spots were then extracted by connected domain analysis. To determine whether a spot represents a target signal (e.g., HER2 gene or chromosome 17), multiple indices (including a spot) of the spot are measured and analyzed. These indices include the size, color, orientation, shape of the point, the multiply-differentiated response of the gaussian filter, the relationship or distance between adjacent points, and some other factor that can be measured by the computer. The metrics are then input into a classifier. In some embodiments, the classifier has been previously trained by, for example, a trained pathologist on a training dataset that has been explicitly identified as representing certain genes in the study (e.g., HER2 gene or chromosome 17). In some embodiments, the classifier has been trained on a set of training slices containing a range of point variations, and the linear boundary binary classifier model is used for teaching at each stage. The resulting model, using the discriminatory hyperplane as a parameter, divides the feature space into two marker regions that define a first gene or a second gene (e.g., HER2 gene or chromosome 17). Once the classifier is trained, an indicator measured for an unknown point in the image is applied to the classifier. The classifier may then indicate the type of point represented (e.g. HER2 gene or chromosome 17).

In other embodiments, the spot detection is performed using the method described in U.S. patent publication No. 2017/0323148, the disclosure of which is incorporated by reference herein in its entirety, in the context of staining a sample with a HER2 dual ISH assay. According to the method of the' 148 publication, since multiple different spectral signals may coexist in a single pixel of an ISH, a color deconvolution algorithm is run on the image, where each pixel is unmixed (e.g., using the unmixing module 203) into three component channels (e.g., red, black, and dark blue channels) using pixels in the optical density domain. After color deconvolution and training the classification module/classifier, the point pixels are classified by a computer device or system and in a manner known to those skilled in the art. In some embodiments, the classification module is a support vector machine ("SVM") (as described herein with respect to the cell detection and classification module 204). In the context of the double ISH of HER2, the detected spots were classified as red, black and/or blue spots. For example, when the contribution of the black channel is significantly larger than the contributions of the other two red and blue channels after the color deconvolution, the point pixel is more likely to be classified as black; when the contribution of the red channel is significantly greater than the contributions of the other blue and black channels after the color deconvolution, the point pixel is more likely to be classified as red.

In some embodiments, after point detection and classification, a refinement procedure may also be run that includes a series of refinement operations to enhance and clarify the initial point detection and/or classification operations. These steps are described in U.S. patent application publication No. 2017/0323148, the disclosure of which is incorporated herein by reference in its entirety. Those skilled in the art will recognize that some or all of these additional steps or procedures may be applied to further enhance the point detection and classification results provided above. Although the' 148 publication discloses steps relating to the dual ISH for HER2 detection, one skilled in the art will be able to adapt and modify these steps to accommodate any ISH probe or assay used. Those skilled in the art will also recognize that all of the operations of the' 148 publication need not be performed in connection with refining classification, and that those skilled in the art will be able to select the appropriate operation based on the output of the detection and classification module and the images provided to the system.

Point counting and sorting module

The points are counted through point detection, classification and optional refinement. In some embodiments, data may be compiled based on the number of points counted. For example, the points may be classified based on counts, and/or a ratio of the number of first points (e.g., black points) to the number of second points (e.g., red points) may be calculated and used for classification or for genetic determination mapping.

In some embodiments, point counting is accomplished using the method described in U.S. patent publication No. 2017/0323148, the disclosure of which is incorporated herein by reference in its entirety. For example, a connected component labeling process may be applied to all first point pixels (e.g., red point pixels) to identify the first blob (the "red blob"). Likewise, a connected component marker is applied to the second dot pixel (e.g., "black dot pixel") to obtain a second blob (e.g., "black blob"). In general, connected component labels scan an image and group its pixels based on their connectivity, i.e., all pixels in a connected component have similar pixel intensity values and are somehow connected to each other.

In the context of dual ISH for HER2 detection, one skilled in the art will appreciate that the black dots are generally smaller than the red dots, and any counting rule must take this factor into account. For example, if the system 200 determines that the size of the point is within the nominal size of a single point, then the point is classified by the classifier as either the HER2 point or chromosome 17 point. If the dot is larger than the nominal size of a single dot, the system 200 determines the area of the cluster and divides it by the area of the nominal dot to determine how many dots can be accommodated in the cluster of dots. The closeness of the points classified as HER2 and chromosome 17 is used in the counting algorithm. Of course, the counting rules can be modified by one skilled in the art depending on the specific ISH probes, assays and protocols used.

In some embodiments, the average size (in pixels) of the black blobs is used to assign a certain number of seeds to the black blob clusters. For a small black spot, it should have one or both of the following: (i) voting intensity (using DoG and radial symmetry to absorbance) is greater than some threshold (as determined empirically and set to 5); and/or (ii) the absorbance is greater than a certain absorbance threshold (as determined empirically and set at 0.33). In some embodiments, for a smaller red spot, it should have one or both of the following: (i) the strength of the vote (using DoG and radial symmetry to the channel) is greater than some threshold (as determined empirically and set to 15); and/or (ii) the a-channel value (from LAB color space) (higher a-value is indicative of redness) is greater than a certain threshold (e.g., empirically determined and set to 133, where the a-channel ranges from 0 to 255) and the absorbance should be greater than a certain threshold (e.g., empirically determined and set to 0.24). For example, a red dot with a diameter of less than 7 pixels can be considered a small red dot. A red dot having a diameter equal to or greater than 7 pixels can be regarded as a large red dot. The average of the black dots may be larger, and thus the diameter of the small black dots may be less than 10 pixels, and the diameter of the large black dots may be equal to or greater than 10 pixels.

Once the first and second blobs (e.g., black and red blobs) are identified, a count of the first and second blobs (e.g., black and red blobs) is returned. In some embodiments, the ratio of the first point to the second point for each nucleus is counted.

In some embodiments, based on the ratio, the expression level may be determined as over-expression, under-expression, etc. (step 315; see also step 415). In some embodiments, the score is compared to a clinically relevant threshold. For example, in the context of HER2 and chromosome 17 staining, the clinically relevant threshold may be the integer 2.

In some embodiments, the ratio of all nuclei may be sorted or ordered. In some embodiments, the ratio may be stored in a database or storage module 240, either alone or in combination with cell/nucleus location information (e.g., x, y coordinates of the cell/nucleus).

Visualization module

The data generated during signal acquisition (step 314; see also step 414) and assessment (step 315; see also step 415) may be interpreted by a visualization module for rapid, consistent analysis. In some embodiments, the overlay image may be generated based on the derived data and the assessment made. For example, in the context of HER2, calculated ratios that exceed a particular threshold (e.g., 2) may be assigned a first indicator (e.g., a first color), while those calculated ratios that are equal to or below the particular threshold (e.g., 2) may be assigned a second indicator (e.g., a second color). In some embodiments, if a color is assigned to the first and second indices, the color may be sketched along the cell perimeter. In other embodiments, if a color is assigned to the first and second indices, the cells may be filled with the color. The generated overlay image may then be superimposed over the entire slice image or any portion thereof (e.g., to facilitate conveying the results to a reviewer for viewing). In some embodiments, the overlay image may contain a calculated ratio of each cell/nucleus with or without other indicators (such as other colors).

In some embodiments, a heat map may be generated with the identified regions that meet certain threshold constraints. For example, a heat map may be generated that illustrates areas having a calculated ratio between 2.0 and 2.5 in a first color, between 2.5 and 3.0 in a second color, and above 3.0 in a third color.

In other embodiments, a histogram may be derived based on the data and may be displayed along with any generated overlay images. In some embodiments, a histogram illustrating the distribution of different calculated ratios may be generated.

In other embodiments, a table may be generated based on the data and may be displayed along with any generated overlay images or histograms. In some embodiments, the table contains a list of a predetermined number of nuclei (e.g., 20) and a calculated ratio of the predetermined number of nuclei. In some embodiments, the table may also contain location information, such as the location of the cell/nucleus, or the mapped tissue region in which the cell/nucleus is located. In some embodiments, the table can further comprise a total dot count for each respective nucleic acid biomarker (e.g., a total black dot count and a total red dot count for each cell nucleus). In this way, the pathologist can review individual cells using the table and manually override any automatically calculated ratios or assessments as necessary.

Other Components practicing embodiments of the disclosure

The system 200 of the present disclosure may be bound to a specimen processing apparatus capable of performing one or more preparation processes on the tissue specimen. The preparation process may include, but is not limited to, specimen dewaxing, conditioning the specimen (e.g., cell conditioning), specimen staining, performing antigen retrieval, performing immunohistochemical staining (including labeling) or other reactions, and/or performing in situ hybridization (e.g., SISH, FISH, etc.) staining (including labeling) or other reactions, as well as other processes for preparing specimens for microscopy, microscopic analysis, mass spectrometry, or other analytical methods.

The processing device may apply a fixative to the specimen. Fixatives can include cross-linking agents (e.g., aldehydes such as formaldehyde, polyoxymethylene, and glutaraldehyde, as well as non-aldehyde cross-linking agents), oxidizing agents (e.g., metal ions and complexes such as osmium tetroxide and chromic acid), protein denaturing agents (e.g., acetic acid, methanol, and ethanol), mechanistically undefined fixatives (e.g., mercuric chloride, acetone, and picric acid), combination reagents (e.g., Carnoy fixative, Methacarn, Bouin solution, B5 fixative, Rossman solution, and Gendre solution), microwaves, and other fixatives (e.g., excluding volume fixation and vapor fixation).

If the specimen is a paraffin-embedded sample, the sample may be deparaffinized using a corresponding deparaffinization liquid. After paraffin removal, any number of chemicals may be applied to the specimen in succession. These chemicals can be used for pretreatment (e.g., reversing protein cross-linking, exposing nucleic acids, etc.), denaturation, hybridization, washing (e.g., stringent washing), detection (e.g., attaching revealing or labeling molecules to probes), amplification (e.g., amplifying proteins, genes, etc.), counterstaining, coverslipping, etc.

The specimen processing apparatus may apply a variety of different chemicals to the specimen. These chemicals include, but are not limited to, stains, probes, reagents, rinses, and/or conditioners. These chemicals may be fluids (such as gases, liquids or gas/liquid mixtures) or the like. The fluid may be a solvent (e.g., polar solvent, non-polar solvent, etc.), a solution (e.g., an aqueous solution or other type of solution), or the like. The reagent may include, but is not limited to, a staining agent, a wetting agent, an antibody (e.g., a monoclonal antibody, a polyclonal antibody, etc.), an antigen recovery solution (e.g., an aqueous or non-aqueous antigen retrieval solution, an antigen recovery buffer, etc.), or the like. The probe may be an isolated nucleic acid or an isolated synthetic oligonucleotide, attached to a detectable label or reporter. Labels may include radioisotopes, enzyme substrates, cofactors, ligands, chemiluminescent or fluorescent agents, haptens, and enzymes.

The specimen processing device may be an automated device such as the BENCHMARK XT instrument and the SYMPHONY instrument sold by Ventana Medical Systems, inc. Ventana Medical Systems, inc. are agents of a number of U.S. patents that disclose Systems and methods for performing automated analysis, including U.S. patent nos. 5,650,327, 5,654,200, 6,296,809, 6,352,861, 6,827,901 and 6,943,029, and U.S. published patent applications nos. 20030211630 and 20040052685, the disclosures of each of which are incorporated herein by reference in their entirety. Alternatively, the specimen may also be processed manually.

After the specimen is processed, the user may transport the specimen slide to the imaging device. In some embodiments, the imaging device is a bright field imager slide scanner. One bright field imager is the iScan HT and DP200(Griffin) bright field scanner sold by Ventana Medical Systems, inc. In an automated embodiment, THE imaging apparatus is a digital pathology apparatus disclosed in international patent application No. PCT/US2010/002772 (patent publication No.: WO/2011/049608) entitled IMAGING SYSTEM AND TECHNIQUES or U.S. patent publication No. 2014/0178169 entitled MAGING SYSTEMS, CASSETTES, AND METHODS OF USING THE SAME filed 9.2011 on 9.9.

The imaging system or device may be a multi-spectral imaging (MSI) system or a fluorescence microscopy system. The imaging system used herein is MSI. Generally speaking, MSI provides a computerized microscope-based imaging system for the analysis of pathological specimens by accessing the spectral distribution of images on a pixel layer. Although there are various multispectral imaging systems, these systems have in common the operation of being able to form multispectral images. Multispectral images refer to images that capture image data at a particular wavelength or a particular spectral bandwidth of the electromagnetic spectrum. These wavelengths may be selected by optical filters or using other means capable of selecting predetermined spectral components, including electromagnetic radiation having wavelengths outside the visible range, such as Infrared (IR).

The MSI system may comprise an optical imaging system, a portion of which contains a spectral selection system that is adjustable to define a predetermined number N of discrete optical wavebands. The optical system may be adapted for imaging of a tissue sample that is illuminated onto an optical detector by a broadband light source transmission. In one embodiment, the optical imaging system may include a magnification system, such as a microscope, having a single optical axis generally spatially aligned with the single optical output of the optical system. When the spectral selection system is adjusted or tuned (e.g., with a computer processor), the system forms a sequence of images of the tissue, e.g., to ensure that the images are acquired in different discrete spectral bands. The apparatus may additionally include a display that displays at least one visually perceptible image of the tissue from the acquired sequence of images. The spectral selection system may include an optical dispersive element such as a diffraction grating, a set of optical filters such as thin film interference filters or any other suitable for selecting a particular pass band from the spectrum of light transmitted from the light source through the sample to the detector in response to user input or preprogrammed processor commands.

In an alternative embodiment, the spectral selection system defines a plurality of light outputs corresponding to N discrete spectral bands. This type of system introduces an output of transmitted light from an optical system and spatially redirects at least a portion of that light output along N spatially distinct optical paths, such that the sample in an identified spectral band can be imaged onto a detector system along an optical path corresponding to the identified spectral band.

Embodiments of the subject matter and the operations described in this specification can be implemented in digital electronic circuitry, or in computer software, firmware, or hardware, including the structures disclosed in this specification and their analogous structures, or in combinations of one or more of them. Embodiments of the subject matter described in this specification can be implemented as one or more computer programs, i.e., as one or more modules of computer program instructions, which can be encoded on a computer storage medium for execution by, or to control the operation of, data processing apparatus. Any of the modules described herein may comprise logic that is executed by the processor. "logic," as used herein, refers to information having any form of instruction signals and/or data that may affect the operation of a processor. Software is an example of logic.

The computer storage medium may be or may be embodied in a computer-readable storage device, a computer-readable storage substrate, a random or serial access memory array or device, or any combination of one or more of them. Furthermore, although the computer storage medium is not a propagated signal, it can be the source or destination of computer program instructions encoded in an artificially generated propagated signal. The computer storage media may also be or be embodied in one or more separate physical components or media, such as multiple CDs, diskettes, or other storage devices. The operations described in this specification may be implemented as operations performed by a data processing apparatus on data stored on one or more computer-readable storage devices or received from other sources.

The term "programmed processor" encompasses various devices, apparatuses, and machines that process data, including by way of example a programmable microprocessor, a computer, a system on a chip, or a plurality or combination of the foregoing. The apparatus may comprise special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit). The apparatus can include, in addition to hardware, code that creates an execution environment for the associated computer program, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, a cross-platform runtime environment, a virtual machine, or a combination of one or more of them. The devices and execution environments may implement a variety of different computing model infrastructures, such as web services, distributed computing, and grid computing infrastructures.

A computer program (also known as a program, software application, script, or code) can be written in any form of programming language, including compiled or interpreted languages, declarative languages, or programming languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, object, or other unit suitable for use in a computing environment. The computer program may, but need not, correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub programs, or portions of code). A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.

The processes and logic flows described in this specification can be performed by one or more programmable processors executing one or more computer programs to perform operations by operating on input data and generating output results. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).

Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and the processor of any one or more digital computers. Generally, a processor will receive instructions and data from a read-only memory or a random access memory or both. The essential elements of a computer are a processor that performs operations in accordance with instructions and one or more memory devices that store instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data (magnetic, magneto-optical disks, or optical disks). However, a computer does not require such a device. Further, the computer may be embedded in another device, e.g., a mobile telephone, a Personal Digital Assistant (PDA), a mobile audio or video player, a game controller, a Global Positioning System (GPS) receiver, or a portable storage device (e.g., a Universal Serial Bus (USB) flash drive), to name a few. Suitable means for storing computer program instructions and data include various forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, such as internal hard disks or removable disks; magneto-optical disks; and CD-ROM and DVD-ROM disks. The processor and the memory may be added by, or incorporated in, special purpose logic circuitry.

To facilitate interaction with a user, embodiments of the subject matter described in this specification can be implemented on a computer having a display device, e.g., an LCD (liquid crystal display), an LED (light emitting diode) display, or an OLED (organic light emitting diode) display, for displaying information to the user and a keyboard and a pointing device, e.g., a mouse or a trackball, by which the user can effect input to the computer. In some embodiments, a touch screen may be used to display information and receive user input. Other kinds of devices may also be used to facilitate interaction with the user; for example, feedback provided to the user can be any form of sensory feedback, such as visual feedback, auditory feedback, or tactile feedback; and may receive user input in any form, including acoustic, speech, or tactile input. In addition, a computer may enable interaction with a user by sending and receiving files to and from a device used by the user; for example, by sending a web page to a web browser on a user's client device in response to a request received from the web browser.

Embodiments of the subject matter described in this specification can be implemented in a computing system that includes a back-end component, e.g., as a data server, or that includes a middleware component, e.g., an application server, or that includes a front-end component, e.g., a client computer having a graphical user interface or a Web browser through which a user can interact with an implementation of the subject matter described in this specification, or any combination of one or more such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication, e.g., a communication network. Examples of communication networks include local area networks ("LANs") and wide area networks ("WANs"), the internet (e.g., the internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks). For example, the network 20 of FIG. 1 may include one or more local area networks.

The computing system may include any number of clients and servers. Typically, the clients and servers are located remotely from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and the client-server relationship to each other. In some embodiments, a server transmits data (e.g., an HTML page) to a client device (e.g., for displaying data to and receiving input from a user interacting with the client device). Data generated at the client device (e.g., as a result of user interaction) may be received at the server from the client device.

Examples of the invention

The above table lists examples of results returned after image analysis for detection of the HER2 protein biomarker and HER2 and chromosome 17 nucleic acid biomarkers. The table also illustrates the calculated ratio of HER2 to chromosome 17, and the number of relevant cells detected.

Other embodiments

A system for assessing genetic aberrations in an image of a biological sample stained for the presence of at least one nucleic acid biomarker, the system comprising: (i) one or more processors, and (ii) one or more memories coupled with the one or more processors, the memories for storing computer-executable instructions that, when executed by the one or more processors, cause the system to perform operations comprising: running a detection algorithm to automatically detect and identify cells in the first image that are stained for the presence of the at least one protein biomarker that meet a predetermined protein biomarker staining criterion; deriving (e.g., automatically) a tumor tissue region in the first image encompassing identified cells that meet the predetermined protein biomarker staining criteria; performing an automatic registration of the first and second images with a common coordinate system such that the derived tumor tissue region in the first image is mapped to the second image to provide a mapped tumor tissue region, wherein the second image comprises a signal corresponding to the presence of at least one nucleic acid biomarker; automatically identifying points within the mapped tumor tissue region that correspond to the signal from the at least one nucleic acid biomarker; and assessing (e.g., automatically assessing) whether tumor nuclei in the mapped tumor tissue region in the second image have genetic aberration based on the identified points.

A method of assessing genetic aberrations in an image of a biological sample stained for the presence of at least one nucleic acid biomarker, the method comprising: automatically detecting cells in the first image stained for the presence of the at least one protein biomarker that meet a predetermined protein biomarker staining criterion; deriving a tumor tissue region in the first image encompassing identified cells that meet the predetermined protein biomarker staining criteria; automatically registering a first image and a second image with a common coordinate system such that the derived tumor tissue region in the first image is mapped to the second image to provide a mapped tissue region, wherein the second image comprises a signal corresponding to the presence of at least one nucleic acid biomarker; automatically identifying points within the mapped tissue region that correspond to the signal from the at least one nucleic acid biomarker; and assessing (e.g., automatically) whether tumor nuclei in the mapped tissue region in the second image have genetic aberration based on the identified points corresponding to the at least one nucleic acid biomarker.

A non-transitory computer-readable medium storing instructions for assessing genetic aberrations in a biological sample stained for the presence of at least one nucleic acid biomarker, the instructions comprising: running a detection algorithm to automatically detect and identify cells in the first image that are stained for the presence of the at least one protein biomarker that meet a predetermined protein biomarker staining criterion; deriving a tumor tissue region in the first image encompassing identified cells that meet the predetermined protein biomarker staining criteria; performing an automatic registration of the first and second images with a common coordinate system such that the derived tumor tissue region in the first image is mapped to the second image to provide a mapped tissue region, wherein the second image comprises a signal corresponding to the presence of at least one nucleic acid biomarker; automatically detecting a point within the mapped tissue region corresponding to the signal from the at least one nucleic acid biomarker; counting all detected points within each tumor cell nucleus within each mapped tissue region; and assessing (e.g., automatically) whether each tumor nucleus in each mapped region has a genetic aberration based on the total number of counted points in each nucleus.

All U.S. patents, U.S. patent application publications, U.S. patent applications, foreign patents, and non-patent publications referred to in this specification and/or listed in the application data sheet, are incorporated herein by reference, in their entirety. Aspects of the embodiments can be modified, if necessary, to employ concepts of the various patents, applications and publications to provide yet further embodiments.

Although the present disclosure has been described with reference to a few illustrative embodiments, it should be understood that numerous other modifications and embodiments can be devised by those skilled in the art that will fall within the spirit and scope of the principles of this disclosure. More specifically, reasonable variations and modifications are possible in the component parts and/or arrangements of the subject combination arrangement within the scope of the foregoing disclosure, the drawings and the appended claims without departing from the spirit of the disclosure. In addition to variations and modifications in the described components and/or arrangements, alternative uses will also be apparent to those skilled in the art.

50页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：将图像映射到合成域

System for automated in situ hybridization analysis

相关技术

网友询问留言