Use Case 5: Mitosis Detection

This blog posts explains how to train a deep learning mitosis detector in accordance with our paper “Deep learning for digital pathology image analysis: A comprehensive tutorial with selected use cases”.

Please note that there has been an update to the overall tutorial pipeline, which is discussed in full here.

This text assumes that Caffe is already installed and running. For guidance on that you can reference this blog post which describes how to install it in an HPC environment (and can easily be adopted for local linux distributions).


As mentioned in the paper, the goal of this project was to create a single deep learning approach which worked well across many different digital pathology tasks. On the other hand, each tutorial is intended to be able to stand on its own, so there is a large amount of overlapping material between Use Cases.

If you’ve already read the Use Case 1: Nuclei Segmentation, in this tutorial you can focus on “Step 1: Patch Extraction” and “Step 6: Refine the classifier”, which contains all the differences from the nuclei tutorial.


The number of mitosis present per high power field is another prognostic indicator used in determining breast cancer grade. Typically, the more aggressive the cancer, the faster the cells are dividing which can be approximated by counting the mitotic events in a histologic snapshot. The current grading scheme divides the mitotic counts into three categories per 10 high power fields, (i) <= 7 mitoses, (ii) 8 – 14 mitoses, and (iii) >= 15 mitoses. This area has recently been shown to be of great interest with many public competitions taking place [6], [5], [7].

In practice, pathologists rely on changing the focal length of an optical microscope to visualize 3 dimensionally the mitotic structure, allowing them to eliminate false positives from their estimates. As such, accurately identifying mitosis on a 2 dimensional digital histology image is very difficult but highly sought after as it would allow for the automatic interrogation of existing large, long-term, repositories. An open question in the field is trying to determine the minimal amount of accuracy necessary for clinical usage.



We break down this approach into 6 steps:

Step 1: Patch Extraction (Matlab): extract patches from all images of both the positive and negative classes

Step 2: Cross-Validation Creation (Matlab): at the patient level, split the patches into a 5-fold training and testing sets

Step 3: Database Creation (Bash): using the patches and training lists created in the previous steps, create 5 sets of leveldb training and testing databases, with mean files, for high performance DL training.

Step 4: Training of DL classifier (Bash):  Slightly alter the 2 prototxt files used by Caffe, the solver and the architecture to point to the correct file locations. Use these to train the classifier.

Step 5: Generating Output on Training  Images (Python): Use final model to generate the output on the training images

Step 6: Refine the classifier: using the output on the training images, we can identify regions which are “difficult” false positive and false negative cases. We use their posterior probabilities to re-sample the images, and repeat all steps, to create a more refined classifier.

There are, of course, other ways of implementing a pipeline like this (e.g., use Matlab to directly create a leveldb, or skip the leveldb entirely, and use the images directly for training) . I’ve found using the above pipeline fits easiest into the tools that are available inside of Caffe and Matlab, and thus requires the less  maintenance and reduces complexity for less experienced users. If you have a suggested improvement, I’d love to hear it!

Dataset Description

The dataset consist of 311 images of size 2,000 x 2,000 @40x selected from 12 breast cancer (BCa) patients. In total there are 550 mitosic centers expertly identified using a focal microscope.

The format of the files is:

01_01.tif: original H&E image

01_01.csv: x,y coordinates of where the mitotic center are

01_01_pc.png: a helper image which places circles at mitotic centers

01_01_cmask.png: the blue ratio segmentation mask

The data and the associated masks are located here (3.3G).

Examples of the multiple files of a single image (cropped), is shown below:

mitosis-01_01r mitosis-01_01_pcr
Original Image Centers Marked of Mitosis
mitosis-01_01_brmaskr mitosis-01_01_cmaskr
Blue Ratio segmented Image Dilated Version of Blue Ratio

Step 1: Patch Extraction (Matlab)

We refer to step1_make_patches.m, which is fully commented.

We take a number of measures to address this challenging task, some inspired by the first place competitor in the competition [6]. Since our input patches to our network are smaller than theirs (32 x 32 as compared to 101 x 101), we modified and extended their approach accordingly. In order to provide enough context for each of the patches, we perform all operations at 20x apparent magnification, such that an entire mitosis can fit into a single patch. This is most important in cases where the mitosis is in the anaphase or telophase, and the coordinates provided by the ground truth are actually in the middle of the two new cells

A high level understanding is provided here:

        1. For the positive class we take each known mitosis location,
          and use a 4 pixel radius around it as each training patch. Since there are very few training pixels available, we add a large number of rotations to augment the training set, in this case rotations of {0; 45; 90; 135; 180; 215; 270}
        2. For the negative class, and to reduce both computation, and improve the selection of patches, we leverage a well-known segmentation technique termed blue-ratio segmentation, since there is evidence that mitosis are highlighted in these regions [35], [36]. We take these blue ratio segmentations, and dilate them to a 20 disk radial mask. This creates regions which we will sample the negative patches from, as it naturally removes trivial examples from the learning process. We sample 2.5 times as many patches as positive patches but only rotate each of them {0; 90; 180; 270} degrees, so that we have more unique patches instead of simply rotated images.
Mitosis center in the middle of yellow circle
Blue ratio segmentation mask
Dilated Blue ratio segmentation mask
      We can see the blue ratio shift image and dilate it to greatly reduce the total area of interest in a sample. In the first image, we can see that the mitosis is indeed located in the middle of the image, included our computational mask. We can see that the mitosis is in the telophase stage, such that the DNA components have split into two pieces (in yellow circle), making it more difficult to identify.
            1. From each of these randomly crop patches, save them to disk. At the same time, we maintain a “patient_struct”, which contains all of the file names which have been written to disk.

The file name format for each patch is as follows:

u_v_w_x_y_z.png   — > example 01_01_1_n_1_0.png

Where u is the patient number, v is the image number, w is the class (0 (not mitosis) or 1 (mitosis center)), x is the type of patch (“n” is negative, “p” is positive, redundant in this use case with w), y is the number of the patch from that image, and z is rotation (0, 45, 90, 135, 180, 215, 270 in this case).

Step 2: Cross-Validation Creation (Matlab)

Now that we have all of the patches written to disk, and we have all of the file names saved into patient_struct, we want to split them into a cross fold validation scheme. We use step2_make_training_lists.m for this, which is fully commented.

In this code, we use a 5-fold validation and for each fold, we create 4 text files. Using fold 1 as an example:

train_w32_parent_1.txt: This contains a list of the images which have been used as part of this fold’s training set. This is similar to test_w32_parent_1.txt, which contains the images used for the test set. An example of the file content is:


train_w32_1.txt: contains the filenames of the patches which should go into the training set (and test set when using test_w32_1.txt). The file format is [filename] [tab] [class]. Where class is either 0 (non-mitosis) or 1 (mitosis-center). An example of the file content is:

01_01_1_n_1_0.png 0
01_01_1_n_1_90.png 0
01_01_1_n_1_180.png 0
01_01_1_n_1_270.png 0
01_01_1_n_2_0.png 0
01_01_1_n_2_90.png 0
01_01_1_n_2_180.png 0
01_01_1_n_2_270.png 0
01_01_1_n_3_0.png 0

All done with the Matlab component!

Step 3: Database Creation (Bash)

Now that we have both the patches saved to disk, and training and testing lists split into a 5-fold validation cohort, we need to get the data ready for consumption by Caffe. It is possible, at this point, to use an Image layer in Caffe and skip this step, but it comes with 2 caveats, (a) you need to make your own mean-file and ensure it is in the correct format and (b) an image layer can is not designed for high throughput. Also, having 100k+ files in a single directory can bring the system to its knees in many cases (for example, “ls”, “rm”, etc), so it’s a bit more handy to compress them all in to 10 databases (1 training and 1 testing for 5 folds), and use Caffe’s tool to compute the mean-file.

For this purpose, we use this bash file:

We run it in the “subs” directory (“./” in these commands), which contains all of the patches. As well, we assume the training lists are in “../”, the directory above it.

Here we’ll briefly discuss the general idea of the commands, while the script has additional functionality (computes everything in parallel for example).

Creating Databases

We use the caffe supplied convert_imageset tool to create the databases using this command:

~/caffe/build/tools/convert_imageset -shuffle -backend leveldb   ./ DB_train_1

We first tell it that we want to shuffle the lists, this is very important. Our lists are in patient and class order, making them unsuitable for stochastic gradient descent. Since the database stores files, as supplied, sequentially, we need to permute the lists. Either we can do it manually (e.g., use sort –random) , or we can just let Caffe do it :)

We specify that we want to use a leveldb backend instead of a lmdb backend. My experiments have shown that leveldb can actually compress data much better without the consequence of a large amount of computational overhead, so we choose to use it.

Then we supply the directory with the patches, supply the training list, and tell it where to save the database. We do this similarly for the test set.

Creating mean file

To zero the data, we compute mean file, which is the mean value of a pixel as seen through all the patches of the training set. During  training/testing time, this mean value is subtracted from the pixel to roughly “zero” the data, improving the efficiency of the DL algorithm.

Since we used a levelDB database to hold our patches, this is a straight forward process:

~/caffe/build/tools/compute_image_mean DB_train_1 DB_train_w32_1.binaryproto -backend leveldb

Supply it the name of the database to use, the mean filename to use as output and specify that we used a leveldb backend. That’s it!

Step 4: Training of DL classifier (Bash)

Setup files

Now that we have the databases, and the associated mean-files, we can use Caffe to train a model.

There are two files which need to be slightly altered, as discussed below:

BASE-alexnet_solver.prototxt: This file describes various learning parameters (iterations, learning method (Adagrad) etc).

On lines 1 and 10 change: “%(kfoldi)d” to be the number of the fold for training (1,2,3,4,5).

On line 2: change “%(numiter)d” to number_test_samples/128. This is to have Caffe iterate through the entire test database. Its easy to figure out how many test samples there are using:

wc –l test_w32_1.txt

BASE-alexnet_traing_32w_db.prototxt: This file defines the architecture.

We only need to change lines 8, 12, 24, and 28 to point to the correct fold (again, replace “%(kfoldi)d” with the desired integer). That’s it!

Note, these files assume that the prototxts are stored in a directory called ./model and that the DB files and mean files are stored in the directory above (../). You can of course use absolute file path names when in doubt.

In our case, we had access to a high performance computing cluster, so we used a python script ( to submit all 5 folds to be trained at the same time. This script automatically does all of the above work, but you need to provide the working directory on line 11. I use this (BASE-qsub.pbs)  PBS script to request resources from our Torque scheduler, which is easily adaptable to other HPC environments.

Initiate training

If you’ve used the HPC script above, things should already be queued for training. Otherwise, you can start the training simply by saying:

~/caffe/build/tools/caffe train –solver=1-alexnet_solver_ada.prototxt

In the directory which has the prototxt files. That’s it! Now wait until it finishes (600,000) iterations. :)

Step 5: Generating Output on Training Images (Python)

At this point, you should have a model available, to generate some output images. Don’t worry, if you don’t, you can use mine.

Here is a python script, to generate the test output for the associated k-fold ( We need to change line 63 from “test” to “train“, as we want to generate output for our training images to create a more robust secound round classifier.

It takes 2 command line arguments, base directory and the fold. Make sure to edit line 88 to apply the appropriate scaling; in this case downscale the images to by 1/2.

The base directory is expected to contain:

BASE/images: a directory which contains the tif images for output generation

BASE/models: a directory which holds the 5 models (1 for each fold)

BASE/train_w32_parent_X.txt: the list of parent IDs to use in creating the output for fold X=1,2,3,4,5, created in step 2

BASE/DB_train_w32_X.binaryproto: the binary mean file for fold X=1,2,3,4,5, created in step 3

It generates 2 output images for each input. A “_class” image and a “_prob” image. The “_prob” image is a 3 channel image which contains the likelihood that a particular pixel belongs to the class. In this case, the Red channel represents the likelihood that a pixel is a not a mitosis center, while the green channel represents the likelihood that a pixel is a mitosis center. The two channels sum to 1. The “_class” image is a binary image using the argmax of the “_probs image”.

Notice that the output masks are 1000 x 1000, or 1/2 the size of the input masks. This is very important, we trained the classifier at “20x”, so we need to test it at 20x. Since the images are all taken at 40x, we down-sample them by 1/2 to obtain the correct apparent magnification .

Step 6: Refine the classifier

As noted in our manuscript, at this stage the classifier will likely yield an f-score of .37+/-.2, which is far lower than optimal. Using step6_make_patches_probs_based_blue_ratio.m, we repeat the process. Except this time, in the spirit of [6], we use the output created in the previous step (which was generated on the training set), to provide posterior probabilities for each pixel. This allows us to identify which patches are likely confusing to the DL classifier and require specialized focus. This script again yields a patient’s struct, and all of the rest of the steps can be redone using these more optimal patches, including the output generation (make sure to replace “train” with “test”, so that the output is generated on the test images).

Input Region of Interest
Ground Truth Locations

Probability Map Output

Note that the the posterior probabilities are computed for every pixel in the test image. To identify a singular location most likely to be the center of a lymphocyte, a convolution can be performed using a disk kernel on the _prob image so that the center of the probably regions are highlighted. Iteratively, the highest point in the image is taken as center and a radius is cleared, which is the same size as a typical lymphocyte to prevent multiple centers from being identified for the same lymphocyte. An example of that process is presented here: step5_identify_peak_in_probs.m

Typically, you’ll want to use a validation set to determine an optimal threshold as it is often not .5 (which is equivalent to argmax). Subsequently, use this threshold on the the “_prob” image to generate a binary image.

Final Notes

Efficiency in Patch Generation

Writing a large number of small, individual files to a harddrive (even SSD) is likely going to take a very long time. Thus for Step 1 & Step 2, I typically employ a ram disk to drastically speed up the processes.  Regardless, make sure Matlab does not have the output directory in its search path, otherwise it will likely crash (or come to a halt), while trying to update its internal list of available files.

As well, using a Matlab Pool (matlabpool open), opens numerous workers which also greatly speed up the operation and is recommended as well.

Efficiency in Output Generation

It most likely will take a long time to apply a the classifier pixel wise to an image to generate the output. In actuality, there are many ways to speed up this process. The easiest way in this case is to solely apply the classifier in places where the dilated blue ratio image is positive, as these are the likely locations for mitosis.

Keep an eye out for future posts where we delve deeper into this and provide the code which we use!


It is very important to use the model on images of the same magnification as the training magnification. This is to say, if your patches are extracted at 20x, then the test images need to be done at 20x as well.

Code is available here

Data is available here (3.3G)

3 thoughts on “Use Case 5: Mitosis Detection”

  1. Thanks for your excellent work. I’m following it and I met some problems. I’m wondering how to get the “Blue Ratio segmented Image” in the Data zip. And what is the “blue ratio segmentation approach”? Did you follow the whole MRGC algorithm or just part of it? Could you please share your code or your pipeline of it?

Leave a Reply

Your email address will not be published. Required fields are marked *