This is Part Two in a series. In Part One, learn how Reveal’s digital assay pipeline addresses common logistical and technical challenges associated with digital image analysis .
What are the current capabilities of the pipeline, and how do they relate against different applications?
A breadth of histological staining methods have been refined over decades to visualize aspects of specimen morphology or cell phenotype without compromising the spatial context of the tissue. These stains vary greatly in pigment and pattern, depending on the relevant experimental or disease feature. While the availability of these laboratory techniques opens up an expanse of research and diagnostic possibilities, the computational approaches to quantifying these features typically share a fundamental structure: whole tissue detection, region of interest segmentation, nuclear segmentation and classification, and statistical summary.
To permit analysis of highly disparate histological image datasets like those shown in Figure 1, cutting edge image processing capabilities and state of the art deep learning architectures are implemented in the pipeline framework.
- Image Processing: The two input images shown in Figure 2 are different imaging modalities, tissues, and file types. Despite these differences, a similar series of image processing functions can be applied to separate (i.e. segment) tissue from background, following optimization of a minimal number of parameters.
Within the pipeline, each step of the image processing workflow is termed an operation, and many operations are combined to form a sub-pipeline. The goal of a sub-pipeline is to achieve a specific task within the end-to-end analysis pipeline. Once designed, the sub-pipeline is easily adapted to future datasets, thus reducing the amount of development time required to solve a similar problem (Figure 3).
The flexibility of technology embedded within the pipeline framework extends to deep learning architectures that can be trained to perform a more specific task.
- Semantic Segmentation: Because tissue sections or biopsies can contain multiple biologically distinct structures, it is often of interest to separate the tissue into human-interpretable categories (or “classes”) to measure morphological features in the image. This type of analysis problem can be addressed using semantic segmentation deep learning architectures. For example, reporting the area and percentage of fibrotic tissue within a Masson’s Trichrome-stained liver biopsy offers a means to score severity of Non-Alcoholic Steatohepatitis from 0 (no fibrosis) to 4 (cirrhosis).
Semantic segmentation can also be leveraged to limit subsequent analysis steps to a specific structure within the larger tissue section. Specifically, a necessary step of detecting Mismatch Repair Deficiency in colon tumors is to limit prediction of genetic status to colon tumor cells and associated stromal regions.
The pipeline framework includes implementation of semantic segmentation models to serve analysis endpoints exemplified in Figure 5. Note that this family of models can be trained to segment features of different scales. Extracellular matrix fibrils can appear as fine structures only tens of pixels wide, whereas functionally distinct regions of a colon specimen span thousands of pixels.
- Object Detection: Consider two diseases that require detecting specific cell phenotypes for diagnosis:
- Ballooning cells observed in hematoxylin and eosin-stained liver specimens indicate status of Non-Alcoholic Fatty Liver Disease (NAFLD).
- Reed-Sternberg cells, which are visible in PAX5-PDL1 dual immunohistochemistry-stained lymph node biopsies, suggest presence of Hodgkin lymphoma.
Although NASH and Lymphoma are pathologically different, detecting the salient disease feature can be solved with an object detection model architecture: the tissue within the image is evaluated, and any occurrence of the relevant cell phenotype is flagged or counted. The snapshots in Figure 6 illustrate the difference in NASH and Lymphoma tissue samples and highlight the similarity of the object detection model output.
- Nuclear Segmentation: A key component of many digital pathology experiments is the ability to segment nuclei within the tissue, so that cells of a particular phenotype can be identified and measured. Three nuclear segmentation neural networks have been implemented in the pipeline framework to facilitate per-cell analysis of many H&E, Immunohistochemistry, and Immunofluorescence stained tissue specimens. Metrics describing the density, percentage, or spatial distribution of a subpopulation of cells within a specimen are all reliant on accurate nuclear segmentation.
- Multiple Instance Learning: Not all clinically relevant data are continuous quantitative endpoints. In many cases, a categorical label that describes a whole tissue specimen (e.g. healthy vs cancerous) is a useful decision support tool. The pipeline framework’s Multiple Instance Learning (MIL) architecture offers a solution for whole-slide level tasks such as these.
An existing application of this strategy within Reveal is an algorithm capable of predicting the genetic status of colon and endometrial tumor samples. Clinical research has shown that tumors driven by a corrupt DNA repair mechanism, termed mismatch repair deficiency, are more likely to respond to treatment with immune checkpoint inhibitors. Using solely an H&E tumor image as input, the digital assay can segment tissue, identify tumor regions, evaluate those regions for MMR deficiency using a MIL model, and report a binary label: MMR deficient or MMR proficient. The binary classification may inform an oncologist’s treatment decision in a clinical setting. Further to reporting a slide level prediction, Reveal’s implementation of the MIL model leverages an attention mechanism that offers deeper biological insights (Figure 7). By generating a heatmap that visualizes which tumor areas were most important in determining the final, slide-level prediction, this algorithm provides explainability: the algorithm output is not just if the tumor is likely to be driven by MMR deficiency, it also emphasizes where the tumor is most likely to be MMR deficient.
Modularized Design (i.e. Plug and Play)
Each element described above can be packaged into independent sub-pipelines with standardized inputs and outputs. These modularized components can be added, removed, or arranged to address previously unseen datasets and derive novel analysis endpoints. Figure 8 illustrates how analysis sub-pipes can be assembled into an end-to-end pipeline, thereby achieving diverse analysis goals. We have also developed a graphical user interface to facilitate “drag and drop” utility, making this accessible to non-coding users.
Current Capabilities and Beyond
Reveal has implemented a rapidly expanding library of over 120 operators in the pipeline framework, which together offer comprehensive functionality spanning computer vision and ML/AI domains. Examples above emphasize the pipeline framework’s application to whole slide image analysis within digital pathology. However, it is clear that the generalized implementation of machine learning and deep learning tools within the pipeline permit integrated analysis of multiple forms of data. Extending the pipeline’s data reader functionality to other high throughput data sources including genomics, proteomics, and metabolomics position the pipeline for novel multi-omic data analysis solutions.
Together, imageDx and the pipeline framework offer tools that are fundamental to streamlined design of whole slide image analysis pipelines. Their integration accelerates bespoke model development and facilitates adaptability to novel image datasets, research questions, and diagnostic endpoints.
About the Author
Stacy Littlechild is Associate Director of Digital Assay Development at Reveal Biosciences. Prior to her current role in the digital pathology space, Stacy studied corneal extracellular matrix biology and intracellular signaling in cancer using many forms of microscopy and computational image analysis. Stacy holds a PhD in Vision Science from Cardiff University, where she was both an International Scholarship recipient and a President’s Research Scholar. Her postdoctoral training was at the Salk Institute of Biological Studies.