On Stain Normalization in Deep Learning

Just wanted to take a moment and share some quick stain normalization type experimental results. We have a trained in-house nuclei segmentation model which works fairly well when the test images have similar stain presentation properties, but when new datasets arrive which are notably different we tend to see a decreased classifier performance.

Here we look at one of these images and ways of improving classifier robustness.

We take an input image (left) and apply a nuclei segmentation model to produce the output mask (right), where we can see the quality is in fact very poor:

bruog bruog_output

When we compare this test image to the types of images in the training set, we can immediately see that they don’t live in a similar color space:

 

template

 

One very fast way to see if color normalization would help, is to take the test image, and perform a per channel normalization to an image from the training set (we’ll call it a template image “ref”). This takes only 3 lines of matlab code:

  1. out=size(io);
  2. for zz=1:3
  3.     out(:,:,zz)=imhistmatch(io(:,:,zz),ref(:,:,zz));
  4. end

A version which supports ignoring of white space:

  1. back=rgb2gray(io)>200;
  2. idx=find(back);
  3. out=size(io);
  4. for zz=1:3
  5.     ioc=io(:,:,zz);
  6.     refc=ref(:,:,zz);
  7.     ioutt=imhistmatch(ioc(idx),refc(idx));
  8.     ioc(idx)=ioutt;
  9.     io(:,:,zz)=ioc;
  10. end

which produced this (same) input image in a slightly different color space. Note how there are even obvious image artifacts from this crude process:

bruog_template

Now we take this new image and apply the same classifier:

bruog_template_output

where we can see the normalization has had a profound effect, and arguably has yielded similar results as to an image from the original training cohort.

So, in this particular case, even gross stain normalization produces a significant improvement in results

The real question now: is it possible to perform data augmentation at training time so that these images will work “naturally” (without individually normalizing them). To that end I wrote an augmentation layer for caffe which during training time will randomly modify the color space in hopes of improving robustness.

I estimated the parameters by looking at how a test image differs from a training image:

  1. for zz=1:3
  2.     c=double(io(:,:,zz));
  3.     cmean=mean(c(:));
  4.     cref=double(iref(:,:,zz));
  5.     crefmean=mean(cref(:));
  6.     [cmean,crefmean,cmean/crefmean]
  7. end
  8.  
  9. ans = 210.0452  172.8703    *1.2150*
  10. ans = 152.0753  113.6565    *1.3380*
  11. ans = 207.2472  152.5671    *1.3584*

And finally settled on:

  1. param_str: '{"rotate": True, "color": True, "color_a_max":1.5, "color_a_min": .3}'

Then retrained the network and produced output on the unaltered input image:

bruog bruog_output_retrain

bruog_output_retrain_overlay

This provides evidence for the hypothesis that if you can quantitatively measure differences in training and testing sets, and accommodate for them using data augmentation during training time, the DL will learn to be robust.

Another idea I tested is to completely remove the mean file from both the training and testing phases. Instead, each patch is individually mean centered on the fly. Essentially this should help combat obvious colorspace translations which tend to show up as brightness. By still using the augmentation layer before the re-meaning, we leverage a coordinate system which centers around zero regardless of stain presentation. The results are stunning as well.

bruog_output_retrain_nomean

An approach like this is likely something we should look further into, especially in the context of cross-site normalization

Leave a Reply

Your email address will not be published. Required fields are marked *