HistoQC: An open-source quality control tool for digital pathology slides

crack_slidepenmark_slide

airbubble_slide

Our paper is out in: Journal of Clinical Oncology: Clinical Cancer Informatics

Purpose: Digital pathology (DP), referring to the digitization of tissue slides, is beginning to change the landscape of clinical diagnostic workflows and has engendered active research within the area of computational pathology. One of the challenges in DP is the presence of artifacts and batch effects; unintentionally introduced during both routine slide preparation (e.g., staining, tissue folding, etc.) as well as digitization (e.g., blurriness, variations in contrast and hue). Manual review of glass and digital slides is laborious, qualitative, and subject to intra/inter-reader variability. There is thus a critical need for a reproducible automated approach of precisely localizing artifacts in order to identify slides which need to be reproduced or regions which should be avoided during computational analysis.

Methods: Here we present HistoQC, a tool for rapidly performing quality control to not only identify and delineate artifacts but also discover cohort level “outliers” (e.g., slides stained darker/lighter than other slides in the cohort). This open-source tool employs a combination of image metrics (e.g., color histograms, brightness, contrast), features (e.g., edge detectors), and supervised classifiers (e.g., pen detection) to identify artifact free regions on digitized slides. These regions and metrics are presented to the user via an interactive graphical user interface, facilitating artifact detection through real-time visualization and filtering. These same metrics afford users the opportunity to explicitly define acceptable tolerances for their workflows.

Results: HistoQC’s output on n=450 slides from The Cancer Genome Atlas (TCGA) was reviewed by 2 pathologists and found to be suitable for computational analysis over 95% of the time.

Conclusion: These results suggest that HistoQC could provide an automated, quantifiable, quality control process for identifying artifacts and measuring slide quality, in turn helping to improve both the repeatability and robustness of DP workflows.

Manuscript available here: HistoQC_w_supplemental

Code available here: HistoQC Github repo

Wiki available here: HistoQC Wiki

Slide Repository is available here: HistoQCRepo

More in-depth tutorial to follow!

histoqc

 

18 thoughts on “HistoQC: An open-source quality control tool for digital pathology slides”

  1. Hi,
    Is Histoqc compatible to work with JPEG.
    IS there any way to convert JPEG files into SVS files?

    Thanks

    1. at the moment, no. not out of a technical issue, but because a JPEG information does not contain the necessary meta data needed for histoQC to work. For example, SVS files (and other whole slide image formats) have magnification embedded in the header, which allows histoqc to appropriately scale the image. To my knowledge, “regular” JPEGs are do not contain this information, though some TIF formats may. Since we use openslide as a backend, any of the formats it supports, histoqc supports: https://openslide.org/formats/

      1. Using a scanner/resolution table as a fallback when openslide cannot find the mpp helped in my case. If openslide cannot read a non-dp format, pyvips was of help, which can read the dp as well as more non-dp formats (but got the mpp wrong at a tiff container while openslide got it…). Not sure whether an interactive scanner model input would be easy to implement, yet.

        1. cool! do you have some code which you can share?

          one difficult component overall is that occasionally the datasets are heterogeneously scanned. for example, TCGA BRCA has a number of images scanned at 20x and a number at 40x, so I didn’t want to implement a general default, as it would be wrong in some cases and likely give false confidence to users. ultimately i would like to implement a magnification prediction algorithm to fall back to, but how to do so in a high throughput robust manner is still not obvious to me

  2. I ran the command as suggested in the instruction, like
    e.g,: HistoQC> python qc_pipeline.py -c config.ini -n 4 remote_file_location/*.svs
    , and always encountered errors on the symlink part, is that part necessary or I can just turn it off?

    1. sorry, not sure i understand. are you asking for to quality control those types of images, or to perform the computation itself?

  3. Great work! In your paper, you mention that the tool is made available with a public repository of slides containing artifacts. I wasn’t able to find where these images are. Could you point us to those?

    Thanks!!

  4. Great. I am hoping to use it for my slides. However, I have problems running it properly. I downloaded some svs files to test this pipeline from the provided link.
    When I run this pipeline, I get the following error:
    2019-11-13 15:42:12,463 – WARNING – Lossy conversion from int64 to uint8. Range [0, 255]. Convert image to uint8 prior to saving to suppress this warning.
    .
    .
    .
    2019-11-13 15:42:20,841 – ERROR – /Users/admin/Documents/HistoQC-master/TCGA-A1-A0SF-01Z-00-DX1.7F252D89-EA78-419F-A969-1B7313D77499.svs – – Error analyzing file (skipping): Inappropriate argument type. (“remove_small_holes() got an unexpected keyword argument ‘min_size'”, ‘/Users/admin/Documents/HistoQC-master/TCGA-A1-A0SF-01Z-00-DX1.7F252D89-EA78-419F-A969-1B7313D77499.svs’, ‘')

    and similar warnings and subsequent errors happen for the other modules, leading to skipping them.

    1. make sure you’re using the versions of the packages as specified in the requirements.txt, they modified the parameters to some of the functions recently and we had to realign and freeze

      1. Ok. when I was installing the packages, it gave an error for

        ERROR: Invalid requirement: ‘scikit-image=0.15.0’ (from line 2 of requirements.txt)

        and I removed the version number!

          1. Thanks so much. The problem was solved. But again something else emerged:

            2019-11-13 18:04:35,812 – ERROR – /Users/admin/Documents/HistoQC-master/TCGA-A1-A0SM-01Z-00-DX1.AD503DBD-4D93-4476-B467-F091254FDF78.svs – – Error analyzing file (skipping): Inappropriate argument value (of correct type). (‘zero-size array to reduction operation maximum which has no identity’, ‘/Users/admin/Documents/HistoQC-master/TCGA-A1-A0SM-01Z-00-DX1.AD503DBD-4D93-4476-B467-F091254FDF78.svs’, ‘')
            2019-11-13 18:04:35,848 - INFO - ------------Done---------

            2019-11-13 18:04:35,849 - INFO - These images failed (available also in error.log), warnings are listed in warnings column in output:
            2019-11-13 18:04:35,849 - INFO - /Users/admin/Documents/HistoQC-master/TCGA-A1-A0SM-01Z-00-DX1.AD503DBD-4D93-4476-B467-F091254FDF78.svs Inappropriate argument value (of correct type). ('zero-size array to reduction operation maximum which has no identity', '/Users/admin/Documents/HistoQC-master/TCGA-A1-A0SM-01Z-00-DX1.AD503DBD-4D93-4476-B467-F091254FDF78.svs', '
            ')
            2019-11-13 18:04:35,849 - INFO - Symlink to output directory created

          2. I suspect there is something wrong with your installation. I downloaded the same svs file, and cloned the github repository and it worked to completion without issue:

            C:\temp\HistoQC>python qc_pipeline.py *.svs
            2019-11-13 20:30:48,632 – WARNING – Configuration file not set (–config), using default: C:\temp\HistoQC/config.ini
            2019-11-13 20:30:48,632 – INFO – Pipeline will use these steps:
            2019-11-13 20:30:48,632 – INFO – BasicModule getBasicStats
            2019-11-13 20:30:49,076 – INFO – ClassificationModule byExampleWithFeatures:coverslip_edge
            2019-11-13 20:30:49,220 – INFO – LightDarkModule getIntensityThresholdPercent:tissue
            2019-11-13 20:30:49,220 – INFO – LightDarkModule getIntensityThresholdPercent:darktissue
            2019-11-13 20:30:49,220 – INFO – BubbleRegionByRegion detectSmoothness
            2019-11-13 20:30:49,225 – INFO – MorphologyModule removeFatlikeTissue
            2019-11-13 20:30:49,230 – INFO – MorphologyModule fillSmallHoles
            2019-11-13 20:30:49,230 – INFO – MorphologyModule removeSmallObjects
            2019-11-13 20:30:49,230 – INFO – BlurDetectionModule identifyBlurryRegions
            2019-11-13 20:30:49,235 – INFO – BasicModule finalProcessingSpur
            2019-11-13 20:30:49,235 – INFO – BasicModule finalProcessingArea
            2019-11-13 20:30:49,235 – INFO – HistogramModule compareToTemplates
            2019-11-13 20:30:49,235 – INFO – HistogramModule getHistogram
            2019-11-13 20:30:49,235 – INFO – BrightContrastModule getContrast
            2019-11-13 20:30:49,240 – INFO – BrightContrastModule getBrightnessGray
            2019-11-13 20:30:49,240 – INFO – BrightContrastModule getBrightnessByChannelinColorSpace:RGB
            2019-11-13 20:30:49,240 – INFO – BrightContrastModule getBrightnessByChannelinColorSpace:YUV
            2019-11-13 20:30:49,240 – INFO – DeconvolutionModule seperateStains
            2019-11-13 20:30:49,245 – INFO – SaveModule saveFinalMask
            2019-11-13 20:30:49,245 – INFO – SaveModule saveThumbnails
            2019-11-13 20:30:49,245 – INFO – BasicModule finalComputations
            2019-11-13 20:30:49,245 – INFO – ———-
            2019-11-13 20:30:49,245 – INFO – Number of files detected by pattern: 1
            2019-11-13 20:30:49,250 – INFO – —–Working on: C:\temp\HistoQC\TCGA-A1-A0SM-01Z-00-DX1.AD503DBD-4D93-4476-B467-F091254FDF78.svs 1 of 1
            2019-11-13 20:30:49,305 – INFO – TCGA-A1-A0SM-01Z-00-DX1.AD503DBD-4D93-4476-B467-F091254FDF78.svs – getMag
            2019-11-13 20:30:51,348 – INFO – TCGA-A1-A0SM-01Z-00-DX1.AD503DBD-4D93-4476-B467-F091254FDF78.svs – getBasicStats
            2019-11-13 20:30:51,349 – INFO – TCGA-A1-A0SM-01Z-00-DX1.AD503DBD-4D93-4476-B467-F091254FDF78.svs – ClassificationModule.byExample: coverslip_edge
            2019-11-13 20:30:51,351 – INFO – TCGA-A1-A0SM-01Z-00-DX1.AD503DBD-4D93-4476-B467-F091254FDF78.svs – Training model ClassificationModule.byExample:coverslip_edge
            2019-11-13 20:30:55,094 – INFO – TCGA-A1-A0SM-01Z-00-DX1.AD503DBD-4D93-4476-B467-F091254FDF78.svs – Training model ClassificationModule.byExample:coverslip_edge….done
            2019-11-13 20:31:12,236 – WARNING – Lossy conversion from int32 to uint8. Range [0, 255]. Convert image to uint8 prior to saving to suppress this warning.
            2019-11-13 20:31:12,817 – INFO – TCGA-A1-A0SM-01Z-00-DX1.AD503DBD-4D93-4476-B467-F091254FDF78.svs – LightDarkModule.getIntensityThresholdPercent: nonwhite
            2019-11-13 20:31:13,755 – WARNING – Lossy conversion from int32 to uint8. Range [0, 255]. Convert image to uint8 prior to saving to suppress this warning.
            2019-11-13 20:31:15,404 – INFO – TCGA-A1-A0SM-01Z-00-DX1.AD503DBD-4D93-4476-B467-F091254FDF78.svs – LightDarkModule.getIntensityThresholdPercent: dark
            2019-11-13 20:31:16,304 – WARNING – Lossy conversion from int32 to uint8. Range [0, 255]. Convert image to uint8 prior to saving to suppress this warning.
            2019-11-13 20:31:16,723 – INFO – TCGA-A1-A0SM-01Z-00-DX1.AD503DBD-4D93-4476-B467-F091254FDF78.svs – BubbleRegionByRegion.detectSmoothness
            2019-11-13 20:31:19,313 – WARNING – Lossy conversion from int32 to uint8. Range [0, 255]. Convert image to uint8 prior to saving to suppress this warning.
            2019-11-13 20:31:19,907 – INFO – TCGA-A1-A0SM-01Z-00-DX1.AD503DBD-4D93-4476-B467-F091254FDF78.svs – removeFatlikeTissue
            2019-11-13 20:31:21,426 – WARNING – Lossy conversion from int32 to uint8. Range [0, 255]. Convert image to uint8 prior to saving to suppress this warning.
            2019-11-13 20:31:21,863 – INFO – TCGA-A1-A0SM-01Z-00-DX1.AD503DBD-4D93-4476-B467-F091254FDF78.svs – fillSmallHoles
            2019-11-13 20:31:22,059 – WARNING – Lossy conversion from int32 to uint8. Range [0, 255]. Convert image to uint8 prior to saving to suppress this warning.
            2019-11-13 20:31:22,773 – INFO – TCGA-A1-A0SM-01Z-00-DX1.AD503DBD-4D93-4476-B467-F091254FDF78.svs – removeSmallObjects
            2019-11-13 20:31:22,961 – WARNING – Lossy conversion from int32 to uint8. Range [0, 255]. Convert image to uint8 prior to saving to suppress this warning.
            2019-11-13 20:31:23,961 – INFO – TCGA-A1-A0SM-01Z-00-DX1.AD503DBD-4D93-4476-B467-F091254FDF78.svs – identifyBlurryRegions
            2019-11-13 20:32:20,979 – WARNING – Lossy conversion from int32 to uint8. Range [0, 255]. Convert image to uint8 prior to saving to suppress this warning.
            2019-11-13 20:32:21,826 – INFO – TCGA-A1-A0SM-01Z-00-DX1.AD503DBD-4D93-4476-B467-F091254FDF78.svs – finalProcessingSpur
            2019-11-13 20:32:22,398 – WARNING – Lossy conversion from int32 to uint8. Range [0, 255]. Convert image to uint8 prior to saving to suppress this warning.
            2019-11-13 20:32:22,849 – INFO – TCGA-A1-A0SM-01Z-00-DX1.AD503DBD-4D93-4476-B467-F091254FDF78.svs – finalProcessingArea
            2019-11-13 20:32:23,027 – WARNING – Lossy conversion from int32 to uint8. Range [0, 255]. Convert image to uint8 prior to saving to suppress this warning.
            2019-11-13 20:32:23,399 – INFO – TCGA-A1-A0SM-01Z-00-DX1.AD503DBD-4D93-4476-B467-F091254FDF78.svs – compareToTemplates
            2019-11-13 20:32:23,601 – INFO – TCGA-A1-A0SM-01Z-00-DX1.AD503DBD-4D93-4476-B467-F091254FDF78.svs – getHistogram
            2019-11-13 20:32:23,987 – INFO – TCGA-A1-A0SM-01Z-00-DX1.AD503DBD-4D93-4476-B467-F091254FDF78.svs – getContrast
            2019-11-13 20:32:24,312 – INFO – TCGA-A1-A0SM-01Z-00-DX1.AD503DBD-4D93-4476-B467-F091254FDF78.svs – getContrast
            2019-11-13 20:32:24,628 – INFO – TCGA-A1-A0SM-01Z-00-DX1.AD503DBD-4D93-4476-B467-F091254FDF78.svs – getContrast
            2019-11-13 20:32:24,678 – INFO – TCGA-A1-A0SM-01Z-00-DX1.AD503DBD-4D93-4476-B467-F091254FDF78.svs – getContrast
            2019-11-13 20:32:24,965 – INFO – TCGA-A1-A0SM-01Z-00-DX1.AD503DBD-4D93-4476-B467-F091254FDF78.svs – seperateStains
            2019-11-13 20:32:25,618 – WARNING – Lossy conversion from float64 to uint8. Range [0, 1]. Convert image to uint8 prior to saving to suppress this warning.
            2019-11-13 20:32:26,111 – WARNING – Lossy conversion from float64 to uint8. Range [0, 1]. Convert image to uint8 prior to saving to suppress this warning.
            2019-11-13 20:32:26,614 – WARNING – Lossy conversion from float64 to uint8. Range [0, 1]. Convert image to uint8 prior to saving to suppress this warning.
            2019-11-13 20:32:26,947 – INFO – TCGA-A1-A0SM-01Z-00-DX1.AD503DBD-4D93-4476-B467-F091254FDF78.svs – saveUsableRegion
            2019-11-13 20:32:27,004 – WARNING – Lossy conversion from int32 to uint8. Range [0, 255]. Convert image to uint8 prior to saving to suppress this warning.
            2019-11-13 20:32:28,095 – WARNING – Lossy conversion from float64 to uint8. Range [0, 1]. Convert image to uint8 prior to saving to suppress this warning.
            2019-11-13 20:32:43,462 – INFO – TCGA-A1-A0SM-01Z-00-DX1.AD503DBD-4D93-4476-B467-F091254FDF78.svs – saveThumbnail
            2019-11-13 20:32:56,325 – INFO – TCGA-A1-A0SM-01Z-00-DX1.AD503DBD-4D93-4476-B467-F091254FDF78.svs – creating image thumb of size 500.0
            2019-11-13 20:32:57,158 – INFO – ————Done———

            2019-11-13 20:32:57,158 – INFO – These images failed (available also in error.log), warnings are listed in warnings column in output:
            Traceback (most recent call last):
            File “qc_pipeline.py”, line 263, in
            os.symlink( origin, target, target_is_directory=True)
            OSError: symbolic link privilege not held

            C:\temp\HistoQC>

  5. Thanks Andrew for sharing this pipeline. I have tried it with your test data. But my images are in czi format. Do you have any suggestions for this format of data?
    Thanks.

    1. unfortunately a bit tricky since openslide doesn’t support it. is there some type of converter to available to convert it to a more popular format?

Leave a Reply

Your email address will not be published. Required fields are marked *