Aperio scanners generate a semi-proprietary file format called SVS. At its heart, SVS files are really a multi-page tiff file storing a pyramid of smaller tiff files of the original image. We’ll look at those here using a SVS file provided by the TCGA (http://cancergenome.nih.gov/) breast cancer cohort:
First we’ll open it in Aperio ImageScope to see what we’re looking at:
We see already 4 different versions of the same image, a high-level view, a low level view, a thumbnail and a “working area”. The working area allows us to scroll in and out of regions of interest (ROI).
If we use Image->Information, we can see some additional information:
Which tells us the apparent magnification which the specimen was scanned at (40X) and microns-per-pixel (MPP), which are important to know when working with images from different sources.
Additionally, we can take a peek at the various layers in the Pyramid:
We can see that at the base, the image is 94k x 80k. This is the uncompressed original image scanned at the apparent magnification (40x). The rest of them are further down-sampled versions of the original. We can see at Level 1, we have a ratio of 4:1, meaning that the image stored there is 25% the side of the original image.
The main reason for this is simple, loading the entire image is very time consuming, if not impossible based on memory constraints. If we look at the lowest resolution necessary to fill every pixel on the screen, essentially the user can’t tell the difference. Each point between the different pyramids is linked together based on the ratios, so knowing the location at one level can easily be mapped into a higher level (future tutorial). Given the tile-type storage of the tiff pages, it makes it possible to load only the necessary tiles (in near real time), so that as the user zooms in, the necessary tiles are loaded and interpolated for them to see.
Great. So how can we use these images in Matlab.
Matlab natively supports mutli-page tiff reading if you simply provide an index like so:
But this isn’t the whole story, using Matlab’s image info function:
we see that there are additional levels present in the tiff image.
What are the additional images?
Page 6 contains the ID, this is usually a patient number or some other type of identification number written (or printed) at the end of the slide.
Page 7 contains a view of the entire slide, which has been automatically cropped (in green box) to show only an area where there is material. Imagine having to store values for “everything”, even the white space of the microscope slide where there is no additional information to be had. So this page shows us where/what has been taken from a high-level view of the entire slide.
We can see Aperio knows about these, and uses them for their front end, but does not report them in the information panel. On the other hand, Matlab has access to all of the information available.
This makes it a bit more tricky as the number of layers reported by Aperio and Matlab don’t line up, but the convention as defined by Aperio is rather straightforward:
|First level||Full resolution image|
|Third level to N-2 Level||A reduction by a power of 2 (4:1 ratio, 16:1 ratio, 32:1 ratio, etc)|
|N-1 Level||Slide Label|
|N Level||Entire Slide with cropped region delineated in green|
With this information at hand, we can decide which level we want to load and can at least start to do some work. The next part of this tutorial discusses how to load only specific sub-sections of the high-resolution image given lower-resolution information.