A Novel Pipeline Framework for Digital Assay Development in Pathology and Multi-omics – Part Two

This is Part Two in a series. In Part One, learn how Reveal’s digital assay pipeline addresses common logistical and technical challenges associated with digital image analysis .

What are the current capabilities of the pipeline, and how do they relate against different applications?

A breadth of histological staining methods have been refined over decades to visualize aspects of specimen morphology or cell phenotype without compromising the spatial context of the tissue. These stains vary greatly in pigment and pattern, depending on the relevant experimental or disease feature. While the availability of these laboratory techniques opens up an expanse of research and diagnostic possibilities, the computational approaches to quantifying these features typically share a fundamental structure: whole tissue detection, region of interest segmentation, nuclear segmentation and classification, and statistical summary.

To permit analysis of highly disparate histological image datasets like those shown in Figure 1, cutting edge image processing capabilities and state of the art deep learning architectures are implemented in the pipeline framework.

Figure 1. Histological staining methods are applicable to most biological tissues and have been developed to visualize a variety of cellular or structural features. These methods give rise to the range of colors, textures, and research/diagnostic opportunities represented in the tiles above.
  • Image Processing: The two input images shown in Figure 2 are different imaging modalities, tissues, and file types. Despite these differences, a similar series of image processing functions can be applied to separate (i.e. segment) tissue from background, following optimization of a minimal number of parameters.
Figure 2. The pipeline contains adaptable image processing algorithms that extend to multiple image modalities, including the brightfield (left) and immunofluorescent (right) images shown above.

Within the pipeline, each step of the image processing workflow is termed an operation, and many operations are combined to form a sub-pipeline. The goal of a sub-pipeline is to achieve a specific task within the end-to-end analysis pipeline. Once designed, the sub-pipeline is easily adapted to future datasets, thus reducing the amount of development time required to solve a similar problem (Figure 3).

Figure 3. The pipeline framework contains flexible data handling, image processing, and deep learning capabilities, termed “operations” (represented by dark blue). A series of operations are combined to accomplish a specific task within an end-to-end analysis workflow, referred to as a sub-pipeline (light blue). The standardization of a sub-pipeline’s input and output permits re-use and adaptability to future, diverse datasets.
Figure 4. A Masson’s Trichrome-stained liver biopsy (left) is analyzed using semantic segmentation to quantify fibrotic tissue (green, right).

The flexibility of technology embedded within the pipeline framework extends to deep learning architectures that can be trained to perform a more specific task.

  • Semantic Segmentation: Because tissue sections or biopsies can contain multiple biologically distinct structures, it is often of interest to separate the tissue into human-interpretable categories (or “classes”) to measure morphological features in the image. This type of analysis problem can be addressed using semantic segmentation deep learning architectures. For example, reporting the area and percentage of fibrotic tissue within a Masson’s Trichrome-stained liver biopsy offers a means to score severity of Non-Alcoholic Steatohepatitis from 0 (no fibrosis) to 4 (cirrhosis).

Semantic segmentation can also be leveraged to limit subsequent analysis steps to a specific structure within the larger tissue section. Specifically, a necessary step of detecting Mismatch Repair Deficiency in colon tumors is to limit prediction of genetic status to colon tumor cells and associated stromal regions.

The pipeline framework includes implementation of semantic segmentation models to serve analysis endpoints exemplified in Figure 5. Note that this family of models can be trained to segment features of different scales. Extracellular matrix fibrils can appear as fine structures only tens of pixels wide, whereas functionally distinct regions of a colon specimen span thousands of pixels.

Figure 5. A semantic segmentation model separates colon tissue into eight biologically relevant categories: colorectal adenocarcinoma epithelium, cancer-associated stroma, normal colon mucosa, mucus, debris, adipose tissue, lymphocytes, and smooth muscle. The categorization of the sample allows prediction of the tumor’s genetic status from identified tumor epithelial and tumor-associated stromal regions.
  • Object Detection: Consider two diseases that require detecting specific cell phenotypes for diagnosis:
    • Ballooning cells observed in hematoxylin and eosin-stained liver specimens indicate status of Non-Alcoholic Fatty Liver Disease (NAFLD). 
    • Reed-Sternberg cells, which are visible in PAX5-PDL1 dual immunohistochemistry-stained lymph node biopsies, suggest presence of Hodgkin lymphoma. 

Although NASH and Lymphoma are pathologically different, detecting the salient disease feature can be solved with an object detection model architecture: the tissue within the image is evaluated, and any occurrence of the relevant cell phenotype is flagged or counted. The snapshots in Figure 6 illustrate the difference in NASH and Lymphoma tissue samples and highlight the similarity of the object detection model output.

Figure 6. Ballooning cells (top panels) and Reed-Sternberg cells (bottom panels) can be detected from H&E-stained liver tissue and dual-IHC-stained lymph node tissue, respectively, using an object detection deep learning architecture.
  • Nuclear Segmentation: A key component of many digital pathology experiments is the ability to segment nuclei within the tissue, so that cells of a particular phenotype can be identified and measured. Three nuclear segmentation neural networks have been implemented in the pipeline framework to facilitate per-cell analysis of many H&E, Immunohistochemistry, and Immunofluorescence stained tissue specimens. Metrics describing the density, percentage, or spatial distribution of a subpopulation of cells within a specimen are all reliant on accurate nuclear segmentation.
  • Multiple Instance Learning: Not all clinically relevant data are continuous quantitative endpoints. In many cases, a categorical label that describes a whole tissue specimen (e.g. healthy vs cancerous) is a useful decision support tool. The pipeline framework’s Multiple Instance Learning (MIL) architecture offers a solution for whole-slide level tasks such as these.
Figure 7. A colon tumor H&E sample is evaluated for mismatch repair deficiency (left). Green and yellow regions in the attention heatmap (right) correspond to areas that are likely to exhibit deficiency in the Mismatch Repair mechanism.

An existing application of this strategy within Reveal is an algorithm capable of predicting the genetic status of colon and endometrial tumor samples. Clinical research has shown that tumors driven by a corrupt DNA repair mechanism, termed mismatch repair deficiency, are more likely to respond to treatment with immune checkpoint inhibitors. Using solely an H&E tumor image as input, the digital assay can segment tissue, identify tumor regions, evaluate those regions for MMR deficiency using a MIL model, and report a binary label: MMR deficient or MMR proficient. The binary classification may inform an oncologist’s treatment decision in a clinical setting. Further to reporting a slide level prediction, Reveal’s implementation of the MIL model leverages an attention mechanism that offers deeper biological insights (Figure 7). By generating a heatmap that visualizes which tumor areas were most important in determining the final, slide-level prediction, this algorithm provides explainability: the algorithm output is not just if the tumor is likely to be driven by MMR deficiency, it also emphasizes where the tumor is most likely to be MMR deficient.

Modularized Design (i.e. Plug and Play)

Each element described above can be packaged into independent sub-pipelines with standardized inputs and outputs. These modularized components can be added, removed, or arranged to address previously unseen datasets and derive novel analysis endpoints. Figure 8 illustrates how analysis sub-pipes can be assembled into an end-to-end pipeline, thereby achieving diverse analysis goals. We have also developed a graphical user interface to facilitate “drag and drop” utility, making this accessible to non-coding users.

Current Capabilities and Beyond

Figure 8. An end-to-end analysis pipeline (each row) encompasses a custom sequence of sub-pipes that result in quantitative data describing the feature(s) of interest. The modularity intrinsic to pipeline and sub-pipeline design allows flexibility in re-using previous solutions, speeding development of future digital assays.

Reveal has implemented a rapidly expanding library of over 120 operators in the pipeline framework, which together offer comprehensive functionality spanning computer vision and ML/AI domains. Examples above emphasize the pipeline framework’s application to whole slide image analysis within digital pathology. However, it is clear that the generalized implementation of machine learning and deep learning tools within the pipeline permit integrated analysis of multiple forms of data. Extending the pipeline’s data reader functionality to other high throughput data sources including genomics, proteomics, and metabolomics position the pipeline for novel multi-omic data analysis solutions.

Together, imageDx and the pipeline framework offer tools that are fundamental to streamlined design of whole slide image analysis pipelines. Their integration accelerates bespoke model development and facilitates adaptability to novel image datasets, research questions, and diagnostic endpoints.

About the Author

Stacy Littlechild is Associate Director of Digital Assay Development at Reveal Biosciences. Prior to her current role in the digital pathology space, Stacy studied corneal extracellular matrix biology and intracellular signaling in cancer using many forms of microscopy and computational image analysis. Stacy holds a PhD in Vision Science from Cardiff University, where she was both an International Scholarship recipient and a President’s Research Scholar. Her postdoctoral training was at the Salk Institute of Biological Studies.

Pathology Revealed

Sign up to receive updates on the latest AI-powered pathology breakthroughs, access digital pathology resources, and more.