A Novel Pipeline Framework for Digital Assay Development in Pathology and Multi-omics – Part One

Analyzing whole slide pathology images at scale is challenging due to the large size of the data and the complexity of models required to quantify a broad range of tissue types and diseases. Reveal Biosciences has developed a novel pipeline framework to accelerate digital assay development and increase efficiency. Reveal has implemented over 120 operators in the pipeline framework to date, which together offer comprehensive functionality spanning computer vision and ML/AI domains. To date, whole slide image analysis pipelines have relied on human input and local hardware to conduct analyses. In comparison, Reveal’s pipeline framework enables AI-powered, end-to-end analysis using distributed cloud computing resources. The generalized implementation of machine learning and deep learning tools within the pipeline permit integrated analysis of multiple forms of data with little additional effort. Extending the pipeline’s data reader functionality to other high throughput data sources such as genomics, proteomics, and metabolomics positions the pipeline and Reveal for novel multi-omic data analysis solutions.

What is Reveal’s Proprietary Pipeline Framework?

Reveal’s proprietary pipeline framework is an in-house set of software tools that facilitates streamlined development of reproducible, AI-powered whole slide image analysis. The pipeline’s integration into Reveal’s cloud-based whole slide imaging platform, imageDx™, means that common logistical and technical challenges associated with digital image analysis are solved in a modular, reusable framework. Overall, the pipeline and platform synergize to replace time spent resolving data ingestion, organization, and labeling challenges with rapidly refining bespoke algorithms toward robust performance.

How Does the Pipeline Framework Add Value?

There are many technical challenges associated with analyzing whole slide image datasets. For example:

  • Scalability: WSIs can reach multiple gigabytes per image, and digital pathology experiments can reach hundreds to thousands of images. This scale requires vast amounts of storage space and compute to extract relevant data and distill meaningful insights.
  • Adaptability: All whole slide images are not created equally. Depending on the brand of the whole slide scanner, pixel information may take on a different hue or represent a different physical size. Although these differences may be imperceptible by eye, subtleties like these can affect performance of AI-driven analyses and lead to inaccurate output metrics, if not carefully tracked and accounted for during analysis.
  • Variability: The process to prepare biological specimens for whole slide imaging involves a number of laboratory techniques, including excision, fixation, embedding, sectioning, staining, and finally, imaging. Despite the tight control on specimen preparation protocols, each one of these pre-analytical variables holds potential to introduce variability and artifacts into the appearance of the sample in the final image.
Figure 1. The panels above show the same region within an H&E-stained specimen that was imaged on three different whole slide scanners. Variation in the hue of hematoxylin and eosin stains are visible and can affect image analysis performance.

Reveal’s pipeline was designed with these challenges in mind, and seamlessly integrates with our vendor-agnostic whole slide image hosting platform, imageDx™, to overcome common obstacles in digital pathology algorithm development.

Figure 2. Whereas many whole slide image analysis software packages limit processing to one image at a time (top), imageDx™ leverages distributed cloud computing resources that process many images in parallel (bottom), substantially reducing analysis runtime.
  • Scalability: To address organization and storage needs, sources of data including images, annotations, and other relevant clinical metadata are stored in a unified database that is readily accessible to the pipeline. Additionally, whereas many existing digital pathology software platforms process each WSI in sequence and rely on the processing power built into a local computer, imageDx™ recruits computational resources at the time of digital assay launch. This allows complex algorithms designed in the pipeline to run in parallel across a distributed cluster of cloud-based servers, greatly reducing algorithm runtime (Figure 2).
  • Adaptability: Built into imageDx™ and the pipeline are custom image readers that support many WSI formats across the digital pathology field. Pipelines in our system can operate on images generated in-house or can be tailored to externally sourced images with minimal additional effort.
  • Variability: The disparity in color and feature appearance across a biological dataset often means that approaches to quantifying these features using traditional image processing methods are less consistent than desired, or not possible at all. While the pipeline contains a suite of image processing functions to support algorithm development, it also contains multiple deep learning model backbones for custom, AI-powered digital assay design (more on this in the next part).

In Part Two of our series, Dr. Stacy Littlechild will discuss the components of a digital assay to process and analyze whole slide images of tissue.

About the Author

Stacy Littlechild is Associate Director of Digital Assay Development at Reveal Biosciences. Prior to her current role in the digital pathology space, Stacy studied corneal extracellular matrix biology and intracellular signaling in cancer using many forms of microscopy and computational image analysis. Stacy holds a PhD in Vision Science from Cardiff University, where she was both an International Scholarship recipient and a President’s Research Scholar. Her postdoctoral training was at the Salk Institute of Biological Studies.

Pathology Revealed

Sign up to receive updates on the latest AI-powered pathology breakthroughs, access digital pathology resources, and more.