HistoQC: An open-source quality control tool for digital pathology slides



Our paper is out in: Journal of Clinical Oncology: Clinical Cancer Informatics

Purpose: Digital pathology (DP), referring to the digitization of tissue slides, is beginning to change the landscape of clinical diagnostic workflows and has engendered active research within the area of computational pathology. One of the challenges in DP is the presence of artifacts and batch effects; unintentionally introduced during both routine slide preparation (e.g., staining, tissue folding, etc.) as well as digitization (e.g., blurriness, variations in contrast and hue). Manual review of glass and digital slides is laborious, qualitative, and subject to intra/inter-reader variability. There is thus a critical need for a reproducible automated approach of precisely localizing artifacts in order to identify slides which need to be reproduced or regions which should be avoided during computational analysis.

Methods: Here we present HistoQC, a tool for rapidly performing quality control to not only identify and delineate artifacts but also discover cohort level “outliers” (e.g., slides stained darker/lighter than other slides in the cohort). This open-source tool employs a combination of image metrics (e.g., color histograms, brightness, contrast), features (e.g., edge detectors), and supervised classifiers (e.g., pen detection) to identify artifact free regions on digitized slides. These regions and metrics are presented to the user via an interactive graphical user interface, facilitating artifact detection through real-time visualization and filtering. These same metrics afford users the opportunity to explicitly define acceptable tolerances for their workflows.

Results: HistoQC’s output on n=450 slides from The Cancer Genome Atlas (TCGA) was reviewed by 2 pathologists and found to be suitable for computational analysis over 95% of the time.

Conclusion: These results suggest that HistoQC could provide an automated, quantifiable, quality control process for identifying artifacts and measuring slide quality, in turn helping to improve both the repeatability and robustness of DP workflows.

Manuscript available here: HistoQC_w_supplemental

Code available here: HistoQC Github repo

Wiki available here: HistoQC Wiki

Slide Repository is available here: HistoQCRepo

More in-depth tutorial to follow!



10 thoughts on “HistoQC: An open-source quality control tool for digital pathology slides”

  1. Hi,
    Is Histoqc compatible to work with JPEG.
    IS there any way to convert JPEG files into SVS files?


    1. at the moment, no. not out of a technical issue, but because a JPEG information does not contain the necessary meta data needed for histoQC to work. For example, SVS files (and other whole slide image formats) have magnification embedded in the header, which allows histoqc to appropriately scale the image. To my knowledge, “regular” JPEGs are do not contain this information, though some TIF formats may. Since we use openslide as a backend, any of the formats it supports, histoqc supports: https://openslide.org/formats/

      1. Using a scanner/resolution table as a fallback when openslide cannot find the mpp helped in my case. If openslide cannot read a non-dp format, pyvips was of help, which can read the dp as well as more non-dp formats (but got the mpp wrong at a tiff container while openslide got it…). Not sure whether an interactive scanner model input would be easy to implement, yet.

        1. cool! do you have some code which you can share?

          one difficult component overall is that occasionally the datasets are heterogeneously scanned. for example, TCGA BRCA has a number of images scanned at 20x and a number at 40x, so I didn’t want to implement a general default, as it would be wrong in some cases and likely give false confidence to users. ultimately i would like to implement a magnification prediction algorithm to fall back to, but how to do so in a high throughput robust manner is still not obvious to me

  2. I ran the command as suggested in the instruction, like
    e.g,: HistoQC> python qc_pipeline.py -c config.ini -n 4 remote_file_location/*.svs
    , and always encountered errors on the symlink part, is that part necessary or I can just turn it off?

    1. sorry, not sure i understand. are you asking for to quality control those types of images, or to perform the computation itself?

  3. Great work! In your paper, you mention that the tool is made available with a public repository of slides containing artifacts. I wasn’t able to find where these images are. Could you point us to those?


Leave a Reply

Your email address will not be published. Required fields are marked *