Download TCGA Digital Pathology Images (FFPE)

Digital pathology image analysis requires high quality input images. While there are a large number of images available in The Cancer Genome Atlas (TCGA), the ones which are currently available in the data portal are frozen specimens and are *not* suitable for computational analysis. This post discusses how to download the Formalin-Fixed Paraffin-Embedded (FFPE) slides for corresponding patients.

First a brief introduction, the TCGA offers two types of slides, flash frozen and Formalin-Fixed Paraffin-Embedded (FFPE). Flash frozen samples are typically produced during surgery in a cryolab to help the surgeon determine if the borders of the tumor are clean( i.e., has the tumor been fully resected). Flash freezing is a fast and “easy” process, but frequently leaves the tissue damaged, giving it a swiss cheese type appearance:

frozen

FFPE slides are the gold standard for diagnostic medicine, and are generated by fixing a specimen in formaldehyde and then embedding it in a paraffin wax block for cutting.  It has a much nicer appearance, making it more amenable to computational analysis:

ffpe

A more full discussion is available here and here.

The TCGA has both types of slides available, so care must be taken to obtain the correct cohort and *not* mix cohorts unless specifically part of your experimental design.

The difference can be found by looking at the particular filename, where files with “TS#” or “BS#”, where # is an integer, is a frozen slide, like this:

TCGA-CH-5765-11A-01-TS1.2a1faf76-526b-4581-b947-e8d733674df7.svs

While files with “DX#”, again where # is an integer, is an FFPE slide:

TCGA-14-0786-01Z-00-DX2.9dd57cfe-f467-4796-a491-48b737a6248c.svs

To perform the download, we need two components, (1) the TCGA download tool, and (2) a manifest file which states using precise id numbers which files to download.

First we need to go to the TCGA data portal, located here: https://portal.gdc.cancer.gov

Then we click on “Repository”:

2018-08-01 14_49_57-GDC

Then click on “slide image” under “Data type”

2018-08-01 14_50_21-Repository

Then “Diagnostic Slide” under “Experimental Strategy”

2018-08-01 14_50_47-Repository

This produces a list of slides, all of which have the “DX#” sting in their filename:

2018-08-01 14_51_43-Repository

We can limit to a specific organ group by clicking,  e.g., Cases, and then breast:

2018-08-01 14_52_27-Repository

Now we have the 1,133 files that we would like to download. We do this by clicking “add all files to cart” (or selecting the ones we are interested in):

2018-08-01 14_53_20-Repository

Lastly, we go to the cart and select download – > manifest:

2018-08-01 14_54_03-Cart

 

This provides us with a txt file that we can feed to the gdc-client:

gdc-client download -m gdc_manifest_20180801_125430.txt

Thats it!

One thought on “Download TCGA Digital Pathology Images (FFPE)”

Leave a Reply

Your email address will not be published. Required fields are marked *