One of the challenges in working in digital pathology is that the associated images can be excessively large, too large to load fully into memory, as well as too large to use in common pipelines. For example, a Aperio SVS file that we’ll look at today is 60,000 x 42,600 pixels. If we tried to load such an image, in RGB space, uncompressed it would require ~7GB, making it too large to consider using in our deep learning pipelines as there wouldn’t be enough RAM on the GPU for both the data and the filter activations.
The obvious way of managing this situation is to split the image into smaller tiles, operate on them separately, and merge the images back together. While we have wrappers which do this in a reasonable fashion in common languages, I was looking for a much more generic way, which affords the opportunity for additional speed ups. As a result, I’ll run through a process I developed using various snippets of code on the net.
The basic premise is that we can use Matlab to split the image in an organized way, with as much code re-use as possible. In fact, this isn’t nearly as difficult as one might expect, except that SVS files contain multiple pages, which contain the same image at different magnifications for improved image navigation. Lets look at some code.
Breaking the image apart
The code for this part is very similar to this blog post which leverages the idea of using image adapters to define how large images are read.
- tileSize = [2000, 2000]; % has to be a multiple of 16.
- input_svs_page=3; %the page of the svs file we're interested in loading
- svs_adapter =PagedTiffAdapter(input_svs_file,input_svs_page); %create an adapter which modulates how the large svs file is accessed
- fun=@(block) imwrite(block.data,sprintf('%s_%d_%d.png',baseFilename,block.location(1),block.location(2))); %make a function which saves the individual tile with the row/column information in the filename so that we can refind this tile later
- blockproc(svs_adapter,tileSize,fun); %perform the splitting
While the tileSize needs to be a multiple of 16, that isn’t a constraint at this stage. Its actually a requirement to save large tif images as described in the following part. We can see here the basic premise is straight forward, we use blockproc to iterate through the image and save sub images. At this point, if we want to process the tiles we can (for example by resizing smaller, or actually doing the analysis). I didn’t opt for that here for two reasons, (a) the deep learning pipeline I have isn’t in matlab its in python, so these tiles will be used outside of this development environment and (b) since now we have all the tiles, we can easily parallelize whatever processing we’re interested in and compute the output for multiple tiles at the same time.
So, we start with this image:
And after splitting, we can see that there are now multiple, non-overlapping tiles:
Note that not all the images are 2000 x 2000. Also, we can see that some of them consist entirely of background. We can leverage this fact to avoid computation of the entire panel should we desire (for example, by requiring a minimum number of pixels to be non-background via a color threshold).
Now that we have our tiles, we can compute their respective output. Nothing too surprising here
We can see the output here:
Putting it back together
Having the output means its time to stitch the images back together.
- outFile='36729.tif'; %desired output filename
- inFileInfo=imfinfo(input_svs_file); %need to figure out what the final output size should be to create the emtpy tif that will be filled in
- inFileInfo=inFileInfo(input_svs_page); %imfinfo returns a struct for each individual page, we again select the page we're interested in
- outFileWriter = bigTiffWriter(outFile, inFileInfo.Height, inFileInfo.Width, tileSize(1), tileSize(1),true); %create another image adapter for output writing
- fun=@(block) imresize(repmat(imread(sprintf('%s_%d_%d_prob.png',baseFilename,block.location(1),block.location(2))),[1 1 3]),1.666666666); %load the output image, which has an expected filename (the two locations added). In this case my output is 60% smaller than the original image, so i'll scale it back up
- blockproc(svs_adapter,tileSize,fun,'Destination',outFileWriter); %do the blockproc again, which will result in the same row/column coordinates, except now we specify the output image adatper to write the flie outwards
- outFileWriter.close(); %close the file when we're done
This process should be straight forward. We need to specify the desired output filename (ending in .tif, of course). Then we leverage the bigTiffWriter provided here, to incrementally fill in the final image. Notice that we’ve made the strong assumption that blockproc is deterministic in that given the same image it will always crop the tiles at the same places, which is in fact true. The only small difference here is that my output images are of a different size (due to the deep learning pipeline that created the output), so I take this opportunity to scale them back up to the expected size. Also, I’ve added the option to my bigTiffWriter to support compression, which is the 6th argument in the constructor.
This final image is 10MB, and is nicely stitched back together. I lovingly call this process humpty dumpty. We can see that the DL pipeline is doing a great job of identifying the cribriform pattern, but that conversation is for another day : )
Code is available here.