Digital pathology image analysis requires high quality input images. While there are a large number of images available in The Cancer Genome Atlas (TCGA), the ones which are currently available in the data portal are frozen specimens and are *not* suitable for computational analysis. This post discusses how to download the Formalin-Fixed Paraffin-Embedded (FFPE) slides for corresponding patients.

First a brief introduction, the TCGA offers two types of slides, flash frozen and Formalin-Fixed Paraffin-Embedded (FFPE). Flash frozen samples are typically produced during surgery in a cryolab to help the surgeon determine if the borders of the tumor are clean( i.e., has the tumor been fully resected). Flash freezing is a fast and “easy” process, but frequently leaves the tissue damaged, giving it a swiss cheese type appearance:


FFPE slides are the gold standard for diagnostic medicine, and are generated by fixing a specimen in formaldehyde and then embedding it in a paraffin wax block for cutting.  It has a much nicer appearance, making it more amenable to computational analysis:


A more full discussion is available here and here.

The TCGA has both types of slides available, so care must be taken to obtain the correct cohort and *not* mix cohorts unless specifically part of your experimental design.

The difference can be found by looking at the particular filename, where files with “TS#” or “BS#”, where # is an integer, is a frozen slide, like this:


While files with “DX#”, again where # is an integer, is an FFPE slide:


To perform the download, we need two components, (1) the TCGA download tool, and (2) a manifest file which states using precise id numbers which files to download.

First we need to go to the TCGA data portal, located here:

Then we click on “Repository”:

Then click on “slide image” under “Data type”

Then “Diagnostic Slide” under “Experimental Strategy”

This produces a list of slides, all of which have the “DX#” sting in their filename:

We can limit to a specific organ group by clicking,  e.g., Cases, and then breast:

Now we have the 1,133 files that we would like to download. We do this by clicking “add all files to cart” (or selecting the ones we are interested in):

Lastly, we go to the cart and select download – > manifest:

This provides us with a txt file that we can feed to the gdc-client:

gdc-client download -m gdc_manifest_20180801_125430.txt

Thats it!

