Background on Color Calibration
Digital whole slide image scanners are designed to take stained tissue on glass slides and digitize them into bytes for usage in the digital world. The process by which slide scanners perform this operation does not produce a perfect digital equivalent of the original slide as the hardware involved (led/blub, camera sensor, quantizer) can introduce some biases during the sampling process. For example, different camera sensors may detect colors with different levels of specificity/accuracy/density, resulting in similar but not perfect representations of the associated real-world subjects.
Concretely, there is often a difference between the color you perceive in the real-world under a microscope versus what you would see if you looked at the corresponding digital copy of the same slide. This blog post discusses how to correct for this discrepancy using ICC profiles.
[Note for simplicity of this blog post, we assume that monitors will perfectly represent the quantified color, when in reality they likely won’t, leading to a related but different problem associated with monitor calibration.]
The difference between the real-world value and the digitized value can be measured and adjusted for via color calibration. If you’ve ever done any digital photography, you may be familiar with a concept like white-balancing images, which essentially adjusts image pixels to match the color of the light source so that white objects actually appear white in the resulting photo. A similar concept is at play here, where some slide scanners (appreciating that having the highest fidelity representation is important) perform calibrations of their devices. Others are offering new products which calibrate scanners and monitors. There are also interesting 3rd party tools, like calibration slides, which can also be used to ensure color fidelity.
There additionally have been new open-source projects, for example SVSUtil, which appply color transforms to an SVS, updating the image tiles in-situ.
At the end of the day, these calibration approaches provide a mapping from known real-world values to the corresponding corrected digital values. This allows newly generated images to undergo the same mapping to yield a color calibrated and corrected image.
Here we are going to talk about ICC Profiles, which are a set of standards created by the International Color Consortium (ICC) and are commonly used in color management. Each profile is specific to a certain device and provides a way to ensure consistent color. This should hopefully make some intuitive sense, as each device has its own properties (chip, light source, etc) and thus would require its own color mapping requirements.
Why is this relevant in digital pathology?
It is important to note that there are essentially two approaches to applying color correction/ICC Profiles to images. One option is that a scanner/camera can perform color calibration as the image is produced, resulting in images which are “stored calibrated”. More commonly, however, is that a raw uncalibrated image is stored, exactly as the scanner has produced it, with the associated ICC Profile being either stored in the image header or made available separately by the company. The underlying assumption by scanner/camera manufacturers is that downstream software tools are cognizant of ICC profiles, and when reading their images will detect the presence of an ICC Profile and apply that profile when needed for their users.
A question you may be asking yourself right now: “Do common digital pathology tools automatically apply ICC Profiles to their images?”
It turns out, in many cases, automatic ICC profile application does not take place. As an example use case, we will look at openslide, a very popular C library that provides a simple interface to read whole-slide images.
How large are these differences? An example provided by Lee Cooper in a CuCIM github issue gives some impression :
You can hopefully see that there is a clear difference between these two images, although structurally they are exactly the same (i.e., only the color values have changed). This visual difference is a result of the application of the ICC profile to the colors in the image, which I think we can hopefully agree, results in a more attractive-looking image on the left-hand side of the figure.
A fair question to ask: “Is what is demonstrated above the maximal difference between an applied and unapplied ICC Profile?”
Unfortunately, the answer is no; sometimes the difference will be more nuanced to the point of being barely noticeable, while in theory, the difference could potentially be much larger.
As a result, one should be aware of the implications and understand how to check/verify the presence of ICC profiles, as well as understand how to apply them.
What happens if we opt to not (or simply didn’t realize that we should) apply ICC profiles if they are supplied with our images?
Primarily, if you’re visually viewing your slides in your own tools using e.g., python and openslide, and don’t apply the ICC profile, you may notice that your images are less vibrant (as shown above), and further may appear differently depending on the tool that you’re viewing it with. For example, ImageScope does apply ICC profiles by default, so if you were to compare against an openslide-generated version, you would likely notice a difference. Visually, however, the differences often appear to be on par with other more minor stain variations, so are unlikely to impact human interpretation of the slide.
On the other hand, what if we’re trying to train and deploy deep learning classifiers? Well, if you’re operating on a single site with a single scanner, and have never applied an ICC profile, you’ve likely not experienced any detrimental effects as all slides produced will be in the same non-calibrated space.
You may now start to see a possible cause for concern: what if you’re using data from multiple sources, scanners, software? Practically speaking, as mentioned above, the impact may not be especially significant since the variations are typically minor as compared to larger inherent stain and scanner variability. But this is highly dependent on the specific algorithm/processing pipeline you may be applying.
That said, at least in my opinion, if the scanner already knows that its resulting slides need calibration, and has provided you with the information needed to (as we’ll see, relatively trivially) correct for these color errors, it makes sense to me to apply this correction. It especially makes sense to do so before applying more aggressive stain normalization techniques, since color calibration is something of a “given ground truth” that is essentially a free noise reduction.
In conclusion, we spend a lot of time trying to homogenize and stain normalize our data, it stands to reason that applying an inexpensive color transformation, to at least attempt to correct for known and measured scanner profiles/deficiencies, may be a step in the right direction!
Reading an ICC profile
We can use PIL (Easier) or a tiff reading library to first read the profile from a whole slide image, in this case an image from the TCGA cohort.
The PIL version looks like this:
- # PIL version
- icc = Image.open(fname).info.get('icc_profile')
- f = io.BytesIO(icc)
While a TIFFFIle version looks like this:
- # TIFFFile version
- with tifffile.TiffFile(fname) as tif:
- tag = tif.pages.tags
- f = io.BytesIO(tag.value)
Notably, both of them read the associated tag, located at byte 34675 (8773 in hex). as defined in the tiff standard, pointed out in embedded ICC Profiles here ). This actually returns a byte string, in this case of 141,992 byes, with a snippet shown here:
In both instances, we will next want to convert this byte-stream into a PIL.ImageCms.ImageCmsProfile object, as provided by the PIL package.
Building a transform
When looking at the PIL documentation, it states that if we anticipate performing more than one transform, we should explicitly build a transform and then apply it; noting that the building process is expensive :
Building the transform is a fair part of the overhead in ImageCms.profileToProfile(), so if you’re planning on converting multiple images using the same input/output settings, this can save you time. Once you have a transform object, it can be used with ImageCms.applyProfile() to convert images without the need to re-compute the lookup table for the transform.
- #icc2rgb = ImageCms.buildTransformFromOpenProfiles(rgbp, prf, "RGB", "RGB") #swapped
- icc2rgb = ImageCms.buildTransformFromOpenProfiles(prf, rgbp, "RGB", "RGB") #correct
Update Nov2022, it was pointed out to me that in the previous version of this post, the two profiles were (intentionally) switched (src vs target). When I tested both versions, the ‘swapped‘ version appeared to result in a better-colored image, so I thought it to be an error in the implementation, but that doesn’t seem to be the case. Practically speaking, I’m unsure what to make of this, below you can see the “correct” image versus the “swapped” image. The swapped image seems to have better coloring and white balancing. I suspect if the same profile is uniformly applied during training/testing there will be little practical difference in the resulting output, but that should be experimentally validated.
Applying a transform
Applying the transform is now quite easy, as a single line of code:
- result = ImageCms.applyTransform(img, icc2rgb)
Notably, the computational overhead is quite minimal, in this case for a 1,000 x 1,000 image:
24.8 ms ± 1.45 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Or for a 5,000 x 5,000 image:
630 ms ± 35.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Comparing the difference
In particular, if we look at the white regions of the image, we can see that their value has increased, and now we see “perfect” white where it is expected.
This is the original 2k x 2k image:
In this zoomed region, we can see it has a white value of [239, 236, 239] within the green circle:
And in the ICC profile applied image, where this white value has now been calibrated, it is [255,255,255] as expected.
We can also visually see the differences in the RGB distributions, where channels 0,1,2 are R,G,B respectively. We can notice that overall we see a lot of values being “lightened” (value raised), with in particular a bunch of pixels now being pegged to a full intensity value of 255:
It is fair to ask, which pixels exactly are affected in which channel, and for that I’ve created the following visualizations in the BWR color space.
If you’re unfamiliar with this divergent color space, I would highly suggest reading more about it here, as it is rapidly becoming my favorite for comparing two images (and is in particular fantastic for comparing registration images). Briefly, similar in concept to a heatmap, except the color “white” implies no change, and the more positive change in value the more red the pixel becomes, and the more negative change in value, the more blue the pixel value becomes. As such it looks like this,
I’ve also taken the BWR image, pulled out both the positive and negative values (separately, and overlaid them on the original input image to help with localization:
These results are quite interesting!
Generally, we can see that the values are overall moved in the positive direction in the green and blue channels, while less affected in the Red channel, suggesting that the scanner accurately reproduces red values without the need for calibration.
Furthermore, we can see that the associated white regions of the original image experience positive shifts in all the color channels, essentially “correcting” the 239 value white we saw above to the desired 255 value. Interestingly, we see that many of the nuclei (stained in blue) are actually decreased in value as a result of this calibration process, which when qualitatively comparing the images , seems to result in more attractive “popping”.
As hardware, software, and calibration technologies improve, I suspect applying color calibration will become more critical for obtaining a high-fidelity digital representation of the physical tissue.
If you are unsure if the software you use takes existing ICC Profiles into account (or in the case of scanners, produces them), I would suggest either asking or putting together a quick experiment with software (e.g., openslide) which is (currently) definitely not applying them. This will give a benchmark to measure against, and if the images are the same then no profile is likely applied (or is available), or if they are different, then a color profile application maybe in effect.
Overall, my intuition suggests that applying this calibration will improve the robustness of rudimentary downstream tools (e.g., thresholding, k-means clustering, stain deconvolution), which is always a welcome improvement.
The associated snippets of code employed in this post are available here, happy calibrating!
Thanks to Profs Lee Cooper and David Gutman for all the discussions surrounding this post!