Use Case 6: Invasive Ductal Carcinoma (IDC) Segmentation

November 9, 2015 choosehappy 77 Comments

This blog posts explains how to train a deep learning Invasive Ductal Carcinoma (IDC) classifier in accordance with our paper “Deep learning for digital pathology image analysis: A comprehensive tutorial with selected use cases”.

Please note that there has been an update to the overall tutorial pipeline, which is discussed in full here.

This text assumes that Caffe is already installed and running. For guidance on that you can reference this blog post which describes how to install it in an HPC environment (and can easily be adopted for local linux distributions).

Preface

As mentioned in the paper, the goal of this project was to create a single deep learning approach which worked well across many different digital pathology tasks. On the other hand, each tutorial is intended to be able to stand on its own, so there is a large amount of overlapping material between Use Cases.

Since the data was provided to us, at the patch level, we are able to reproduce the exact training and test sets of [8]. As a result, we don’t perform any k-fold testing, so we have 1 less step than the previous use cases.

Background

Invasive Ductal Carcinoma (IDC) is the most common subtype of all breast cancers. To assign an aggressiveness grade to a whole mount sample, pathologists typically focus on the regions which contain the IDC. As a result, one of the common pre-processing steps for automatic aggressiveness grading is to delineate the exact regions of IDC inside of a whole mount slide.
We obtained the exact dataset, down to the patch level, from the authors of [8] to allow for a head to head comparison with their state of the art approach, and recreate the experiment using our network. The challenge, simply stated, is can our smaller more compact network produce comparable results? Our approach is at a notable disadvantage as their network accepts patches of size 50 x 50, while ours use 32 x 32 thus being provided 60% less pixels of context to the classifier.

Overview

We break down this approach into 4 steps:

Step 1: Patch Extraction (Matlab): extract patches from all images of both the positive and negative classes and generate the training and test list

Step 2: Database Creation (Bash): using the patches and training lists created in the previous step, create a leveldb training and testing database with mean files, for high performance DL training.

Step 3: Training of DL classifier (Bash): Run the provided prototxt files (solver and architecture) to train the classifier using Caffe.

Step 4: Generating Output on Test Images (Python): Use final model to generate the output

There are, of course, other ways of implementing a pipeline like this (e.g., use Matlab to directly create a leveldb, or skip the leveldb entirely, and use the images directly for training) . I’ve found using the above pipeline fits easiest into the tools that are available inside of Caffe and Matlab, and thus requires the less maintenance and reduces complexity for less experienced users. If you have a suggested improvement, I’d love to hear it!

Dataset Description

The original dataset consisted of 162 whole mount slide images of Breast Cancer (BCa) specimens scanned at 40x. From that, 277,524 patches of size 50 x 50 were extracted (198,738 IDC negative and 78,786 IDC positive).

Each patch’s file name is of the format:

u_xX_yY_classC.png — > example 10253_idx5_x1351_y1101_class0.png

Where u is the patient ID (10253_idx5), X is the x-coordinate of where this patch was cropped from, Y is the y-coordinate of where this patch was cropped from, and C indicates the class where 0 is non-IDC and 1 is IDC.

The data and traning/test set partitions are located here (1.6G).

Examples of these images can be seen below

Step 1: Patch Extraction (Matlab)

We refer to step1_make_patches_and_list_all_types.m, which is fully commented and contains options for the the 3 versions discussed in the paper (a) cropped version, (b) resized version, (c) resized with additional rotations for class balancing.

A high level understanding is provided here:

Load the training, validation and test files which indicate which patients were used for which stage.
For each image, load its respective patches and either resize them to fit our architecture or crop them (50×50 – > 32 x 32).
Save the modified patches to them to disk. At the same time, we write 6 files, indicating which patches belong in which set (Training, testing, validation ). These files are

train_w32_parent_1.txt: This contains a list of the patient IDs which have been used as part of the training set. This is similar to valid_w32_parent_1.txt and test_w32_parent_1.txt, for the validation and testing sets respectively. An example of the file content is:

10304
9346
9029
12911

train_w32_1.txt: contains the filenames of the patches which should go into the training set (and test and validation sets when using test_w32_1.txt and valid_w32_1.txt, respectively). The file format is [filename] [tab] [class]. Where class is either 0 (non-IDC) or 1 (IDC). An example of the file content is:

12909_idx5_x101_y1301_class0.png 0
12909_idx5_x101_y1301_class0r.png 0
12909_idx5_x1151_y1351_class0.png 0
12909_idx5_x1151_y1351_class0r.png 0

All done with the Matlab component!

Step 2: Database Creation (Bash)

Now that we have both the patches saved to disk, and training and testing lists, we need to get the data ready for consumption by Caffe. It is possible, at this point, to use an Image layer in Caffe and skip this step, but it comes with 2 caveats, (a) you need to make your own mean-file and ensure it is in the correct format and (b) an image layer can is not designed for high throughput. Also, having 100k+ files in a single directory can bring the system to its knees in many cases (for example, “ls”, “rm”, etc), so it’s a bit more handy to compress them all in to 3 databases, and use Caffe’s tool to compute the mean-file.

For this purpose, we use this bash file: step3_make_dbs.sh

We run it in the “subs” directory (“./” in these commands), which contains all of the patches. As well, we assume the training lists are in “../”, the directory above it.

In this use case, we have a validation set given to us, which was used initially in [8] to determine learning variables, but subsequently added into the training set. In our paper we concatenated the two files to create a single larger train_w32_1.txt, as our learning parameters and iterations are fixed, thus not requiring a validation set.

Here we’ll briefly discuss the general idea of the commands, while the script has additional functionality (computes everything in parallel for example).

Creating Databases

We use the caffe supplied convert_imageset tool to create the databases using this command:

~/caffe/build/tools/convert_imageset -shuffle -backend leveldb ./ DB_train_1

We first tell it that we want to shuffle the lists, this is very important. Our lists are in patient and class order, making them unsuitable for stochastic gradient descent. Since the database stores files, as supplied, sequentially, we need to permute the lists. Either we can do it manually (e.g., use sort –random) , or we can just let Caffe do it 🙂

We specify that we want to use a leveldb backend instead of a lmdb backend. My experiments have shown that leveldb can actually compress data much better without the consequence of a large amount of computational overhead, so we choose to use it.

Then we supply the directory with the patches, supply the training list, and tell it where to save the database. We do this similarly for the test set.

Creating mean file

To zero the data, we compute mean file, which is the mean value of a pixel as seen through all the patches of the training set. During training/testing time, this mean value is subtracted from the pixel to roughly “zero” the data, improving the efficiency of the DL algorithm.

Since we used a levelDB database to hold our patches, this is a straight forward process:

~/caffe/build/tools/compute_image_mean DB_train_1 DB_train_w32_1.binaryproto -backend leveldb

Supply it the name of the database to use, the mean filename to use as output and specify that we used a leveldb backend. That’s it!

Step 3: Training of DL classifier (Bash)

Setup files

Now that we have the databases, and the associated mean-file, we can use Caffe to train a model.

There are two files involved, which may need to be slightly altered, as discussed below:

BASE-alexnet_solver.prototxt: This file describes various learning parameters (iterations, learning method (Adagrad) etc).

On lines 1 and 10 change: “%(kfoldi)d” to “1”, since we have only 1 fold.

On line 2: change “%(numiter)d” to number_test_samples/128. This is to have Caffe iterate through the entire test database. Its easy to figure out how many test samples there are using:

“wc –l test_w32_1.txt”

BASE-alexnet_traing_32w_db.prototxt: This file defines the architecture.

We only need to change lines 8, 12, 24, and 28 to point to the correct fold (again, replace “%(kfoldi)d” with 1). That’s it!

Note, these files assume that the prototxts are stored in a directory called ./model and that the DB files and mean files are stored in the directory above (../). You can of course use absolute file path names when in doubt.

In our case, we had access to a high performance computing cluster, so we used a python script (step4_submit_jobs.py) to submit the training process to the HPC. This script automatically does all of the above work, but you need to provide the working directory on line 11. I use this (BASE-qsub.pbs) PBS script to request resources from our Torque scheduler, which is easily adaptable to other HPC environments.

Initiate training

If you’ve used the HPC script above, things should already be queued for training. Otherwise, you can start the training simply by saying:

~/caffe/build/tools/caffe train –solver=1-alexnet_solver_ada.prototxt

In the directory which has the prototxt files. That’s it! Now wait until it finishes (600,000) iterations. 🙂

Step 4: Generating Output on Test Images (Python)

At this point, you should have a model available, to generate some output images. Don’t worry, if you don’t, you can use mine.

Here is a python script, to generate the test output for the associated k-fold (step5_create_output_images_kfold.py).

It takes 2 command line arguments, base directory and the fold. Make sure to edit line 88 to apply the appropriate scaling or cropping depending on your training protocol.

The base directory is expected to contain:

BASE/images: a directory which contains the tif images for output generation

BASE/models: a directory which holds the learned model

BASE/test_w32_parent_1.txt: the list of parent IDs to use in creating the output for fold 1, created in step 1

BASE/DB_train_w32_1.binaryproto: the binary mean file for fold 1 created in step 2

It generates 2 output images for each input. A “_class” image and a “_prob” image. The “_prob” image is a 3 channel image which contains the likelihood that a particular pixel belongs to the class. In this case, the Red channel represents the likliehood that a pixel belongs to the non-IDC class, and the green channel represents the likelihood that a pixel belongs to the IDC class. The two channels sum to 1. The “_class” image is a binary image using the argmax of the “_probs image”.

9176_level4_prob_noresize_morerot_mask_overlay

The annotated image on the left shows in green where the pathologist has identified IDC. On the right, we overlay a heatmap onto the same image, where the more red the pixel is, the more likely it is IDC. We note that the regions at twelve o’clock are not actually false positives, but were too small to be deemed interesting by the pathologist, thus they were not originally labeled.
Typically, you’ll want to use a validation set to determine an optimal threshold as it is often not .5 (which is equivalent to argmax). Subsequently, use this threshold on the the “_prob” image to generate a binary image.

Final Notes

Efficiency in Patch Generation

Writing a large number of small, individual files to a harddrive (even SSD) is likely going to take a very long time. Thus for Step 1 & Step 2, I typically employ a ram disk to drastically speed up the processes. Regardless, make sure Matlab does not have the output directory in its search path, otherwise it will likely crash (or come to a halt), while trying to update its internal list of available files.

As well, using a Matlab Pool (matlabpool open), opens numerous workers which also greatly speed up the operation and is recommended as well.

Efficiency in Output Generation

It most likely will take a long time to apply a the classifier pixel wise to an image to generate the output. In actuality, there are many ways to speed up this process. The easiest way in this case is to simply use a larger stride such that you compute every 2nd or 3rd pixel since IDC segmentation doesn’t require nearly as much precision, as say, nuclei segmentation. Another technique is to simply white threshold away the background regions which often take up a significant portion of the images.

Keep an eye out for future posts where we delve deeper into this and provide the code which we use!

Magnification

It is very important to use the model on images of the same magnification as the training magnification. This is to say, if your patches are extracted at 10x, then the test images need to be done at 10x as well.

Code is available here

Data is available here (1.6G)

77 thoughts on “Use Case 6: Invasive Ductal Carcinoma (IDC) Segmentation”

Mattias says:

October 20, 2016 at 7:47 pm

Hi – Thanks for sharing all of this information. Are the 162 whole slide images available or just the patches linked above? Thanks again. M

Reply
1. choosehappy says:
  
  October 20, 2016 at 8:38 pm
  
  To be fair those “patches” are 2000 x 2000 : ) But yes, at this time only those ROIs are available. To be honest, I’m not entirely sure the WSI they came from are, but there aren’t annotations available for the rest of the slide so at the time when I was curating this dataset they weren’t that interesting. If you need additional images, you can check out the TCGA repository; I believe they currently have over 300 breast cancer cases in WSI format available for free download.
  
  Reply
  1. julia says:
    
    May 11, 2023 at 1:47 pm
    
    Hello, do you have the link to access of the dataset of the images with a higer resoltuion and not 50×50?
    
    Reply
    1. choosehappy says:
      
      May 14, 2023 at 1:37 pm
      
      should be able to access them here: https://bitbucket.org/aacruzr/tcga-bca/src/7effd4ea4c867b0c4a7c702c2812ea7a94009a5f/
      
      Reply
      1. Julia says:
        
        May 14, 2023 at 5:28 pm
        
        Thank you for the response, I have another question. Do you have a link to access the dataset of the original images in the format of PNG, which were 162. I would really appreciate it since I need them to conduct an investigation, and I should require images that are larger than 50×50 in order to study the cases more thoroughly. I appreciate your assistance a lot!!
      2. choosehappy says:
        
        May 15, 2023 at 7:44 am
        
        sorry i don’t have the files you’re looking for, you can try contacting the first author, he may be able to help!
      3. Julia says:
        
        May 14, 2023 at 6:58 pm
        
        Are the whole slides from which these patches were extracted available for download?
      4. choosehappy says:
        
        May 15, 2023 at 7:45 am
        
        the TCGA ones are available via the TCGA data poral Download TCGA Digital Pathology Images (FFPE)
      5. alex says:
        
        May 16, 2023 at 3:51 pm
        
        Hi, do you know who is the first author of the investigation? Just to know where I can find the original files. Thanks
      6. choosehappy says:
        
        May 17, 2023 at 7:58 am
        
        take a look here: https://www.spiedigitallibrary.org/conference-proceedings-of-spie/9041/1/Automatic-detection-of-invasive-ductal-carcinoma-in-whole-slide-images/10.1117/12.2043872.short?SSO=1
      7. berta says:
        
        May 17, 2023 at 1:12 pm
        
        Hello, the dataset available above, are patches from images of the TCGA data portal?
      8. choosehappy says:
        
        May 18, 2023 at 9:45 am
        
        You would need to ask the first author of the study, we just adopted their patches for our work
Mattias says:

October 24, 2016 at 7:33 pm

Is it correct to say that in order to run this code we need the cases_train.txt (..val, test) files? Are you willing to provide them? My apologies if they are posted somewhere already.

Reply
1. choosehappy says:
  
  October 24, 2016 at 7:40 pm
  
  Hmmm, I thought they were already bundled in the zip file, but it seems they weren’t. I’ve just uploaded a new version of the zip file with these files included.
  
  Reply
Elton Rexhepaj says:

November 2, 2016 at 8:39 am

Hi Andrew,

I wanted to congratulate for the great work here. I read the paper on JPatholInform as well and I really appreciate the work done. DL might well be the way to go to address some very challenging DP applications for preclinical and research use.

I had one question with regard to this case but generally speaking to all the use cases. Have you tried other DL platforms beside Caffe and can you comment on this ?

thanks again for all your efforts.

Elton

Reply
1. choosehappy says:
  
  November 2, 2016 at 10:40 am
  
  Hi Elton,
  
  Thank you for your kind words, they’re much appreciated!
  
  To answer your question, I have used tensorflow, theano, and pylearn. Ultimately I think the “best” framework is one that matches the specific task. The others I mentioned are more geared towards deep learning research, in the sense that they are more easily extensible (e.g., python vs c++) when wanting to add novel layer types, learning schemes, etc. I find the Caffe/Digits combination to be very suitable towards deep learning application, wherein you take existing layers and learning schemes and apply them to new datasets. A good example is the JPI paper you mentioned, which is a pure DL application and requires no modification or extension of the DL framework.
  
  Reply
  1. Elton Rexhepaj says:
    
    November 7, 2016 at 8:17 am
    
    Hi,
    
    Thanks a lot for the quick reply. I tried a few of the pre-trained models in a few slides to detect tubuluar structures and T-cell infiltration. I think it works pretty good with a tendency to calling t-cell fibroblast nuclei or not very well differentiated tumour cells. Some of these you can filter afterwards with a priori knowledge of the distribution of the t-cells and based on the shape as well.
    
    I didn’t tried to train a model myself but before doing it just wanted to ask on the training issue if you think there might be a benefit to include a deconvolution + normalisation step for the heamatoxylin ? I found this particular useful on another pattern recogn application (tissue folding filtering) to reduce tissue preprocessing, staining biais due to center to center protocol variabilities.
    
    I don’t know if you have tried to include such preprocessing step and if you would have any recommendations in the context of the DL models you have been working on ? I read a few papers but most papers rely on the mean centering of the RGB projection of the HE sections.
    
    Thanks
    Elton
    
    Reply
    1. choosehappy says:
      
      November 7, 2016 at 9:26 am
      
      We’ve done quite a bit of testing in that realm using various preprocessing stain normalization techniques. Overall, when applied across all images in a cohort, we’ve seen modest performance *decreases* (1 or 2%), but perhaps only once an increase (~.2%). We have a few papers coming out where we show the results from these experiments, but they’re not very encouraging. Overall, this should make some type of sense, as the whole concept of deep learning is that the data shouldn’t be preprocessed (except mean centering), and should instead be delegated to the network to learn dynamically. This of course assumes that the data you’re training on is representative of the data you’re testing on, which is why its important to have multiple patients and (hopefully) different batches/cohorts/etc so that there is less of a correlation between samples (technical protocol, etc). We have seen some benefits on the “opposite” approach, wherein, instead of attempting to make all the images the same (as in normalization), instead, inject “noise” (color modifications, etc..basically creating synthetic real-world data with noise) to augment the dataset to improve robustness. Usually, its easier to make noisy images, which need not follow a pattern, than it is to create a single preprocessing technique which will operate well across all possible inputs.
      
      Reply
      1. Elton Rexhepaj says:
        
        November 7, 2016 at 10:32 am
        
        That’s interesting that you see decrease of performance after normalisation. You would think the opposite should happen but it can make sense as by removing the biais you go towards a perfect (biais free dataset) that when submitted to the DL framework will force the DL network to encode this state of the data. Then when the CD+Normalisation doesn’t work that well the performance of DL could be heavily impacted.
        
        Yes including data from several patients is a MUST but the difficulty with pathology data is two patients are note alike and difficult to create a training/test dataset that is representative of the cohort initially and the population afterwards. But I think your proposition of artificially alteration of the raw data is a good approach. We do this as well with regard to affine tranformations of the interesting ROIs but one can think to extend with regard to other biais factors such staining intensity etc.
        
        Thanks for the insight and looking forward the commig publications.
        
        Elton
      2. choosehappy says:
        
        November 7, 2016 at 10:38 am
        
        You can also check out this recent paper (not from my group), where they do HSV based augmentations:
        
        https://www2.warwick.ac.uk/fac/sci/dcs/people/research/csrkbb/tmi2016_ks.pdf
Elton Rexhepaj says:

November 7, 2016 at 10:42 am

Interesting. Thanks a mill !

Reply
Eric says:

November 14, 2016 at 8:39 am

Hi Andrew,

Thanks for sharing these invaluable information. I have a quick question here. For these ‘Segmentation task’, the training and testing data are all patch level. But the label is actually pixel level. Does that mean, when you have a patch which labelled as, say nuclei, it means the centre pixel of the patch is within a nuclei? And likewise, the output segmentation image is also done by pixel- by-pixel classification, and for each pixel a 32*32 surrounding patch is extracted and served as ‘feature’ for the classification?

Cheers,
Eric

Reply
1. choosehappy says:
  
  November 14, 2016 at 10:40 am
  
  That is correct. The center pixel determines the label for the patch. The patch itself provides the context for the decision.
  
  Reply
  1. Eric says:
    
    November 16, 2016 at 2:27 am
    
    Thanks for you quick reply, Andrew. If this is the case, i have a similar problem here. The goal is to estimate the volume of liver fat by the biopsy image. But the labelled data are limited (12000 in total and only 1000 are fat, so even using augmentation is still hard to get enough for training), I’m just wondering have you seen any public data for this problem. Or any trained network you have in hand is suitable for this problem ( the tissue images are quite similar to what you have, and the fat disappears after staining, which means appears as white blank)?
    
    Reply
    1. choosehappy says:
      
      November 16, 2016 at 7:18 am
      
      You can try transfer learning if you have a small dataset, a simple approach is described here:https://github.com/NVIDIA/DIGITS/tree/master/examples/fine-tuning
      
      Reply
      1. Elton says:
        
        November 18, 2016 at 6:27 am
        
        I will give it a try as well as some of the pre-trained models were not performing as I was expecting on some of our data (lymphocyte detection and tubule segmentation) but I believe if it is tissue to tissue changes that might cause this. Some of the pretrained features could be biaised a little bit by the tissue context and maybe tuning could improuve the performances.
        
        However on the fat quantification for liver biopsies you could do it faster following a more classical segmentation approach with a pre-processing step enhancing vacoules structures (series of convolutions with circles of various radius). CellProfiler has such approach already integrated into their list of analysis modules.
        
        Thanks for sharing !
      2. choosehappy says:
        
        November 18, 2016 at 8:28 am
        
        When moving from cohort to cohort there are a few things to take into account. Mainly, the magnifications need to match exactly. One can’t expect a classifier trained at 5x to work on images at 40x. That said, I’ve seen some variance across scanners as the microns per pixel (MPP) supplied by each vendor is different. When building models in-house this isn’t usually an issue since it is likely the “next” image will come from the same scanner as all the previous images. As well, individual labs have slightly different protocols involving stain concentration, specimen thickness, fixation procedure etc. All of these may have an impact on classifier performance. Again, a learn from data approach is only as “wise” as the data provided to it during training time. We’re seeing that when using a new cohort from a different institution, it helps to mark up a single image and use that to “fine-tune” the existing classifier for that task to gain robustness. Its usually a much smaller burden than say annotationg a bunch of new images.
Elton says:

November 18, 2016 at 10:46 am

I was keeping the same resolution as it was used during the training (based on what was indicated in the webpage). However, your point on the difference of scanner to scanner is valid. Aperio and 3DHistoTech have different optics for example. So for the same zoom level pixel size (micron per pixel) is different.

The data I was testing come from Aperio/Leica. 3DHistoTech has Nicon cameras which are also different from one instrument to another whether Aperio has the same resolution (micron per pixel).

Which instrument where the data for the training comming from ?

Reply
1. choosehappy says:
  
  November 18, 2016 at 10:50 am
  
  I would have to guess either Aperio or Ventana. The IDC data wasn’t curated by me, I obtained it from the authors i mentioned in the paper as they had previously published a paper using it and we wanted to use the same dataset. An easy way of figuring it out is to measure and compare the size of lymphocytes since their size is typically very consistent, even in the presence of cancer.
  
  Reply
  1. Elton says:
    
    November 18, 2016 at 11:18 am
    
    yeah indeed the lymphocytes differ of a few pixels from a few screenshots I was looking at.
    
    Reply
  2. amir says:
    
    June 25, 2017 at 3:47 am
    
    Hi – Thanks for sharing all of this information
    but i couldn’t download the dataset 🙁 the link doesen’t work correctly
    
    Reply
    1. choosehappy says:
      
      June 26, 2017 at 6:51 pm
      
      You can try again, it seems we have a catastrophic harddrive failure. Things should be back up now 🙂
      
      Reply
      1. amir says:
        
        June 27, 2017 at 8:03 am
        
        thank you so much dear
Kai says:

August 20, 2017 at 9:39 am

Hi,

thank you very much for sharing these invaluable information. Unfortunately I can not download the dataset. One question regarding IDC segmentation: have you included images with inflammatory/lymphocytic stroma?

Reply
1. Kai says:
  
  August 20, 2017 at 4:33 pm
  
  download was successful later. But where can I find the cases_(train/val/test).txt files ?
  
  Reply
  1. choosehappy says:
    
    August 21, 2017 at 7:32 am
    
    sorry about that, you can find the files in the appropriate github directory
    
    Reply
2. choosehappy says:
  
  August 21, 2017 at 7:24 am
  
  the data was sampled “fairly” across all the patients we had, so the distribution of the characteristics should be typical of what one expects to see in the wild
  
  Reply
Jamie says:

September 28, 2017 at 11:49 am

Hi,

Where did you upload the new Zip life? I need the training txt file but it’s not in the current Zip folder to download.
Thank you

Reply
1. choosehappy says:
  
  September 29, 2017 at 12:21 pm
  
  you can find them in the github directory for the IDC use case
  
  Reply
Jonas says:

October 2, 2017 at 8:24 am

Hi,

I was wondering how we should organize the database for this case.
Let’s say I have all the data in a folder :’USE_CASE_6_MITOSIS_DEETECTION’.
For Step 1, how should we organize the folders to run step 1?
USE_CASE_6_MITOSIS_DEETECTION/images/*all the images and folders here*
USE_CASE_6_MITOSIS_DEETECTION/subs
Where would the .txt files be?

Thank you

Reply
1. choosehappy says:
  
  October 2, 2017 at 8:36 am
  
  everything should be in the root directory except for subs, which go in the subs directory
  
  Reply
  1. Jonas says:
    
    October 3, 2017 at 8:38 am
    
    I tried running but kept getting the following messages from Matlab:
    
    identifier: ‘MATLAB:imagesci:imwrite:fileOpen’
    message: ‘Unable to open file “./subs/15902_idx5_x3001_y951_class11.png” for writing. You might not have write permission.’
    cause: {0×1 cell}
    stack: [2×1 struct]
    
    Reply
    1. Shir says:
      
      June 5, 2018 at 2:10 pm
      
      Hey!
      How did you solve this issue?
      I got the same message and I’m not sure what exactly is the problem there
      
      Reply
      1. choosehappy says:
        
        June 5, 2018 at 5:30 pm
        
        which error? unable to make directory?
  2. Jonas says:
    
    October 3, 2017 at 11:51 am
    
    But where do we put the subs directory? What goes in the subdirectory?
    
    Reply
    1. choosehappy says:
      
      October 3, 2017 at 8:21 pm
      
      you’re making this a lot harder than it needs to be. all downloaded files go into a single directory. then you run the provided scripts/commands. they produce whatever directory structure and files are necessary
      
      Reply
Nick says:

November 1, 2017 at 10:35 am

Is the f-measure provided in the paper based on all patches within the test dataset or is it the mean of the f-measures for each case within the test dataset?

Reply
1. choosehappy says:
  
  November 1, 2017 at 11:07 am
  
  all patches within the test dataset
  
  Reply
  1. Nick says:
    
    November 1, 2017 at 2:57 pm
    
    Thanks for your answer and the fast response!
    
    Trying to reproduce your results I used the “mean of f-measures computed for each case” and found the results to be inferior.
    
    I just found out about micro- and macro-averaging: http://rushdishams.blogspot.de/2011/08/micro-and-macro-average-of-precision.html
    
    Did you choose the micro- over the macro-averaging for a specific reason? Did you do the same for the epithelium dataset?
    
    Reply
    1. choosehappy says:
      
      November 1, 2017 at 3:54 pm
      
      To allow for a fair comparison against Cruz’s paper, who happens to be a colleague of mine, i used the same evaluation approach as he did. epi use case took the mean f-score across all images
      
      Reply
Ningkai Wu says:

December 22, 2017 at 9:36 am

Hi, thanks for the great tutorial. Currently I am trying to reproduce your USER CASE 6 IDC result. However, I do not understand how did you calculate the F score? According to step 4, we are predicting every pixel for each 32*32 test patch. However, we do not have the pixel level label for the test patch (because the label is for every center pixel). Also, where can I find some annotated samples for demo just like the one shown in step 4? Thanks!

Reply
1. choosehappy says:
  
  December 22, 2017 at 10:47 am
  
  thank you for your comment. as discussed in the manuscript, we used the same patches as the original author (who is a collaborator of mine). They in fact used the patches to compute their f-score in their paper and we followed suit in ours so that the metrics would be comparable. As a result, you only need to compute a single value per patch, and the center pixel determines the label of that patch. You make a good point though, when i wrote up this blog post i followed a template of the 6, but in fact should have removed step 4. sorry for the confusion!
  
  Reply
  1. Ningkai Wu says:
    
    December 23, 2017 at 7:24 am
    
    Thank you so much for the fast response. So in order to calculate the F-score, we make a prediction for each test patch? One more thing, are there any samples images and corresponding annotated images available just like the ones shown in step 4 ? Happy Christmas!
    
    Reply
    1. choosehappy says:
      
      December 28, 2017 at 11:15 am
      
      Yes to the first question. Not currently to the second. We’re working to get the original data and annotations publicly released but there are some holdups.
      
      Reply
Shamim says:

January 5, 2018 at 3:02 pm

Hi Respective Authors,
Thanks for a complete and well-defined research work. I couldn’t download training and test image file from the webpages “The data and training/test set partitions are located here (1.6G).”

So If possible to share the link where we can download training and test image file of IDC.

Reply
1. choosehappy says:
  
  January 5, 2018 at 3:03 pm
  
  http://andrewjanowczyk.com/wp-static/IDC_regular_ps50_idx5.zip
  
  Reply
affogato4me says:

January 26, 2018 at 11:31 pm

are the whole slides from which these patches were extracted available for download?

Reply
1. choosehappy says:
  
  January 26, 2018 at 11:33 pm
  
  Unfortunately, they aren’t our resource so we can’t release directly, but we’re in progress of obtaining official approval for a release
  
  Reply
Sol says:

March 31, 2018 at 7:40 pm

Hi,
I created a full PyTorch based Jupyter notebook using this dataset. It is available here:
https://github.com/bayesianio/applied-dl-2018/blob/master/lab-2-Breast-Cancer-Histopathology-SeNet.ipynb

Reply
1. choosehappy says:
  
  April 2, 2018 at 5:57 pm
  
  very cool, thanks for sharing!
  
  Reply
Reza says:

April 17, 2018 at 6:11 am

Hi
Thanks for your nice research work.

Can this data set “IDC_regular_ps50_idx5.zip” be used for writing research papers?

Reply
Reza says:

April 17, 2018 at 6:15 am

I mean, Can anybody allow to publish paper by using the IDC dataset?

Reply
1. choosehappy says:
  
  April 17, 2018 at 3:21 pm
  
  sure, you should also check this paper which uses the same dataset: https://www.nature.com/articles/srep46450 we’re in the process of releasing all of the ground truth for those samples now. it will be under the manuscript titled “High-throughput adaptive sampling for whole-slide histopathology image analysis (HASHI) via convolutional neural networks: application to invasive breast cancer detection ” in PLOSOne with authors: Angel Cruz-Roa,Hannah Gilmore Ajay Basavanhally Michael Feldman Shridar Ganesan Natalie Shih John Tomaszewski Anant Madabhushi Fabio González. should be available sometime this year
  
  Reply
Valentin Krasontovitsch says:

May 30, 2018 at 7:47 pm

Hey Andrew,

thanks for the article, and the great tutorial and code alongs : D
Really helpful if you’re trying to use deep learning in digital pathology and haven’t the faintest clue how to go about it ; )

An aha moment for me was to understand how you manage to train on data that has a boolean label, and then get out a prediction which assigns a value to each pixel – actually had to go into the code to understand how it’s done. I assume this is a common technique of machine learning (which I obviously wasn’t familiar with)? It would have helped me greatly to elaborate that point. Or perhaps you do that in one of the other tutorials…
Anyways, great work, and I also have a question: Currently downloading the data to have a look myself, but I’m pondering the problem of extracting patches from a set of regions of interest from some slide images. Wondering how to go about it…
Reading the original article that produced the data, it says

> The tile tissue sampling process involves extraction of square regions of the same size (200 × 200 μm), on a rectangular grid for each whole-slide image. Only tissue regions are invoked during the sampling process and any regions corresponding to non-tissue within the background of the slide are ignored.

Does that mean that all tiles on the grid were considered that contained tissue?

Cheers,
Valentin

Reply
1. Valentin Krasontovitsch says:
  
  May 31, 2018 at 4:15 pm
  
  Looking at the data, I can see that the patches don’t overlap, and are not chose randomly either. That pretty much clears it up : )
  
  Reply
  1. choosehappy says:
    
    June 5, 2018 at 5:29 pm
    
    great : ) let me know if you have any other questions
    
    Reply
Nadia says:

June 22, 2018 at 2:00 pm

Hi Andrew,
I have found the paper “High-throughput adaptive sampling for whole-slide histopathology image analysis (HASHI) via convolutional neural networks: application to invasive breast cancer detection”, and I have been able to download the TGCA dataset with the corresponding ground truth. You say that the paper uses the same dataset that you have used in your paper “Deep learning for digital pathology image analysis: a comprehensive tutorial with selected use cases”. However, I can not able to find a correspondence between the name of the folders of your dataset (available here: http://www.andrewjanowczyk.com/deep-learning/) and the name of the images used in this paper “High-throughput adaptive sampling for whole-slide histopathology image analysis (HASHI) via convolutional neural networks: application to invasive breast cancer detection”. Could you help me?
Thanks in advance

Reply
Nadia says:

June 22, 2018 at 3:16 pm

In detail, I should reconstruct the original image, but since the patches available here http://www.andrewjanowczyk.com/deep-learning/ do not cover the whole image, I cannot be able to reconstruct it, but only the annotated patches.

Reply
1. choosehappy says:
  
  July 9, 2018 at 10:03 am
  
  yes that is the case. the author of the paper you mention was able to provide the exact patches that they used, so for a fair technological comparison i used those as well. they’re highly biased based on his selection method and do not cover the entire image. if you want to do a pure technological comparison against our method, you should use the patches that i provide. if you want to show improved methodology, you should use the whole ground truth and resample as per your new method. hope that clears things up!
  
  Reply
Alan says:

August 11, 2018 at 1:05 pm

Hey Andrew,

Thanks for the article and the great dataset.
I find that there are only patches in the zip but the whole slide, so I suppose I can just use it to train a binary classifier right? But it’s seemed that the output images are like the whole slide images.So did I miss something important like how to splice the patches to the whole slide ?

Reply
1. choosehappy says:
  
  August 14, 2018 at 11:40 am
  
  correct. only patches are available from this manuscript. you should check http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0196828
  
  Reply
  1. Alan says:
    
    August 16, 2018 at 7:31 am
    
    Thanks 😀 , it helps me a lot.
    
    Reply
    1. choosehappy says:
      
      August 17, 2018 at 8:27 am
      
      sure, no problem!
      
      Reply
Joan says:

May 2, 2019 at 12:10 pm

Hey Andrew,

First of all, thanks for this awesome tools that you post here.

I am trying to use this dataset in order to perform IDC identification on TCGA breast one. However, I wonder if you are sure that your images are actually 40x. I am comparing nuclei size and I think that they’re more around 5-10x.

I’m not completely sure since the 50×50 size is not quite good for assess it, could your confirm/reject this?

Reply
1. choosehappy says:
  
  May 2, 2019 at 5:02 pm
  
  I think we discussed this in the manuscript:
  
  http://www.jpathinformatics.org/article.asp?issn=2153-3539;year=2016;volume=7;issue=1;spage=29;epage=29;aulast=Janowczyk;type=3
  
  Patch selection technique
  
  To provide sufficient context (as discussed above in epithelium segmentation section), the authors have down sampled their original ×40 images by a factor of 16:1, for an apparent magnification of ×2.5. We attempted three different approaches of using these 50 × 50 patches, and casting them into our 32 × 32 solution domain:
  
  Reply
AI lover says:

December 9, 2022 at 10:44 pm

Hi Andrew!
Thanks for sharing this dataset. I have two questions about it:
1- How did you decide what to label each patch? Can we say if central pixel of patch is in benign tumor, then we label the patch with zero? Or we should count the number of benign and malignant pixels in patch and check which one is a larger number then select the patch label based on the larger group?

2- Should we make a classifier just based on patches? I mean, doesn’t the number 162 affect our calculations? Is it important to know each patch is from which WSI? If it matters, how should we identify the WSI number that our patch came from?

Thanks

Reply
1. choosehappy says:
  
  December 11, 2022 at 7:38 pm
  
  1. Both approaches are valid, many people use a 50% threshold to determine the +/- label of the patch. that said, i think using the center will tend to produce more robust results since it will provide a more consistent signal to the DL model
  
  2. strictly speaking you don’t *need* any information to train the model except the patch and the label. if you want additional information, you can store it in the database. the new approaches we’re using are described here – Digital pathology classification using Pytorch + Densenet and this one Digital Pathology Segmentation using Pytorch + Unet
  
  Reply