Download TCGA Digital Pathology Images (FFPE)

Digital pathology image analysis requires high quality input images. While there are a large number of images available in The Cancer Genome Atlas (TCGA), the ones which are currently available in the data portal are frozen specimens and are *not* suitable for computational analysis. This post discusses how to download the Formalin-Fixed Paraffin-Embedded (FFPE) slides for corresponding patients.

First a brief introduction, the TCGA offers two types of slides, flash frozen and Formalin-Fixed Paraffin-Embedded (FFPE). Flash frozen samples are typically produced during surgery in a cryolab to help the surgeon determine if the borders of the tumor are clean( i.e., has the tumor been fully resected). Flash freezing is a fast and “easy” process, but frequently leaves the tissue damaged, giving it a swiss cheese type appearance:

frozen

FFPE slides are the gold standard for diagnostic medicine, and are generated by fixing a specimen in formaldehyde and then embedding it in a paraffin wax block for cutting.  It has a much nicer appearance, making it more amenable to computational analysis:

ffpe

A more full discussion is available here and here.

The TCGA has both types of slides available, so care must be taken to obtain the correct cohort and *not* mix cohorts unless specifically part of your experimental design.

The difference can be found by looking at the particular filename, where files with “TS#” or “BS#”, where # is an integer, is a frozen slide, like this:

TCGA-CH-5765-11A-01-TS1.2a1faf76-526b-4581-b947-e8d733674df7.svs

While files with “DX#”, again where # is an integer, is an FFPE slide:

TCGA-14-0786-01Z-00-DX2.9dd57cfe-f467-4796-a491-48b737a6248c.svs

To perform the download, we need two components, (1) the TCGA download tool, and (2) a manifest file which states using precise id numbers which files to download.

First we need to go to the TCGA data portal, located here: https://portal.gdc.cancer.gov

Then we click on “Repository”:

2018-08-01 14_49_57-GDC

Then click on “slide image” under “Data type”

2018-08-01 14_50_21-Repository

Then “Diagnostic Slide” under “Experimental Strategy”

2018-08-01 14_50_47-Repository

This produces a list of slides, all of which have the “DX#” sting in their filename:

2018-08-01 14_51_43-Repository

We can limit to a specific organ group by clicking,  e.g., Cases, and then breast:

2018-08-01 14_52_27-Repository

Now we have the 1,133 files that we would like to download. We do this by clicking “add all files to cart” (or selecting the ones we are interested in):

2018-08-01 14_53_20-Repository

Lastly, we go to the cart and select download – > manifest:

2018-08-01 14_54_03-Cart

 

This provides us with a txt file that we can feed to the gdc-client:

gdc-client download -m gdc_manifest_20180801_125430.txt

Thats it!

8 thoughts on “Download TCGA Digital Pathology Images (FFPE)”

    1. either need to make yourself or find a published paper which has used them and ask them for whatever annotations you’re interested in

  1. Is there any formal document from GDC mentioned that files with “TS#” or “BS#” are frozen slides, and files with “DX#” are FFPE slides? I find some files with “TSA” or “TSB”, and don`t know what they mean, so I am really confused.

    1. i dont know of any, if you find one please let me know : ) the TS and BS stand for “top slide” and “bottom slide” and are used during surgery to ensure that resection has clean boundaries. since the patient is still on the operating table, these are always flash frozen. “diagnostic” slides by definition are FFPE. this can be seen when looking at the data portal under “experimental” strategy, there are two options “tissue slide” (frozen) and “diagnostic slide” (ffpe). not sure if there will be a formal document explicitly saying this since its fairly routine practice to my knowledge

Leave a Reply

Your email address will not be published. Required fields are marked *