Dividing and re-merging large images (Humpty Dumpty)

July 19, 2016 choosehappy 29 Comments

One of the challenges in working in digital pathology is that the associated images can be excessively large, too large to load fully into memory, as well as too large to use in common pipelines. For example, a Aperio SVS file that we’ll look at today is 60,000 x 42,600 pixels. If we tried to load such an image, in RGB space, uncompressed it would require ~7GB, making it too large to consider using in our deep learning pipelines as there wouldn’t be enough RAM on the GPU for both the data and the filter activations.

The obvious way of managing this situation is to split the image into smaller tiles, operate on them separately, and merge the images back together. While we have wrappers which do this in a reasonable fashion in common languages, I was looking for a much more generic way, which affords the opportunity for additional speed ups. As a result, I’ll run through a process I developed using various snippets of code on the net.

The basic premise is that we can use Matlab to split the image in an organized way, with as much code re-use as possible. In fact, this isn’t nearly as difficult as one might expect, except that SVS files contain multiple pages, which contain the same image at different magnifications for improved image navigation. Lets look at some code.

Breaking the image apart

The code for this part is very similar to this blog post which leverages the idea of using image adapters to define how large images are read.

tileSize = [2000, 2000]; % has to be a multiple of 16.
input_svs_page=3; %the page of the svs file we're interested in loading
input_svs_file='36729.svs';
[~,baseFilename,~]=fileparts(input_svs_file);
svs_adapter =PagedTiffAdapter(input_svs_file,input_svs_page); %create an adapter which modulates how the large svs file is accessed
tic
fun=@(block) imwrite(block.data,sprintf('%s_%d_%d.png',baseFilename,block.location(1),block.location(2))); %make a function which saves the individual tile with the row/column information in the filename so that we can refind this tile later
blockproc(svs_adapter,tileSize,fun); %perform the splitting
toc

While the tileSize needs to be a multiple of 16, that isn’t a constraint at this stage. Its actually a requirement to save large tif images as described in the following part. We can see here the basic premise is straight forward, we use blockproc to iterate through the image and save sub images. At this point, if we want to process the tiles we can (for example by resizing smaller, or actually doing the analysis). I didn’t opt for that here for two reasons, (a) the deep learning pipeline I have isn’t in matlab its in python, so these tiles will be used outside of this development environment and (b) since now we have all the tiles, we can easily parallelize whatever processing we’re interested in and compute the output for multiple tiles at the same time.

So, we start with this image:

And after splitting, we can see that there are now multiple, non-overlapping tiles:

Note that not all the images are 2000 x 2000. Also, we can see that some of them consist entirely of background. We can leverage this fact to avoid computation of the entire panel should we desire (for example, by requiring a minimum number of pixels to be non-background via a color threshold).

Interlude

Now that we have our tiles, we can compute their respective output. Nothing too surprising here 🙂

We can see the output here:

Putting it back together

Having the output means its time to stitch the images back together.

tic
outFile='36729.tif'; %desired output filename
inFileInfo=imfinfo(input_svs_file); %need to figure out what the final output size should be to create the emtpy tif that will be filled in
inFileInfo=inFileInfo(input_svs_page); %imfinfo returns a struct for each individual page, we again select the page we're interested in
outFileWriter = bigTiffWriter(outFile, inFileInfo.Height, inFileInfo.Width, tileSize(1), tileSize(1),true); %create another image adapter for output writing
fun=@(block) imresize(repmat(imread(sprintf('%s_%d_%d_prob.png',baseFilename,block.location(1),block.location(2))),[1 1 3]),1.666666666); %load the output image, which has an expected filename (the two locations added). In this case my output is 60% smaller than the original image, so i'll scale it back up
blockproc(svs_adapter,tileSize,fun,'Destination',outFileWriter); %do the blockproc again, which will result in the same row/column coordinates, except now we specify the output image adatper to write the flie outwards
outFileWriter.close(); %close the file when we're done
toc

This process should be straight forward. We need to specify the desired output filename (ending in .tif, of course). Then we leverage the bigTiffWriter provided here, to incrementally fill in the final image. Notice that we’ve made the strong assumption that blockproc is deterministic in that given the same image it will always crop the tiles at the same places, which is in fact true. The only small difference here is that my output images are of a different size (due to the deep learning pipeline that created the output), so I take this opportunity to scale them back up to the expected size. Also, I’ve added the option to my bigTiffWriter to support compression, which is the 6^th argument in the constructor.

This final image is 10MB, and is nicely stitched back together. I lovingly call this process humpty dumpty. We can see that the DL pipeline is doing a great job of identifying the cribriform pattern, but that conversation is for another day : )

Code is available here.

29 thoughts on “Dividing and re-merging large images (Humpty Dumpty)”

choosehappy says:

September 16, 2016 at 5:43 pm

Another approach using imagemagik:

this will split it into 1000 x 1000 images:

convert -crop 1000×1000 INPUT_IMAGE_NAME cropped_%d.png

this will merge them back together. “9x” specifies 9 tiles across, which i got by dividing the image width by 1,000 pixels (From previous command) and then taking the ceil:
montage `ls -1 cr* | sort -V` -tile 9x -geometry +0+0 result_prob.png

Reply
Nik says:

June 19, 2017 at 3:31 pm

I do not have svs files but jp2 files. Do you have any suggestions for splitting them? I open them in the Aperio ImageScope too but I do not have svs files of them!

Reply
1. Caroline says:
  
  May 28, 2019 at 4:19 am
  
  Hi Nik,
  
  How did your code look at the end? I’m working on jp2 files too and it just won’t work. Cheers
  
  Reply
Nik says:

June 20, 2017 at 8:36 am

I got my answer! blockproc supports TIFF and JPEG2000 (jp2) natively. So there is no need to use the function “adapt” to this data format.

Reply
1. choosehappy says:
  
  June 21, 2017 at 10:32 am
  
  you can also use “convert” from imagemagick, the command is in a comment above
  
  Reply
2. Poupack says:
  
  August 2, 2017 at 11:02 pm
  
  Hi Nik,
  I want to work with pathology jp2 images which are large. I am a beginner in this research . Do you know how I could split big jp2 images to blocks?
  Thank you
  
  Reply
  1. choosehappy says:
    
    August 4, 2017 at 11:09 am
    
    try the convert process discussed in the comments above
    
    Reply
3. shiba says:
  
  April 24, 2019 at 7:58 am
  
  Dear Nik,
  can you please explain how we can use this code if my image is in jpg format.
  thanks
  
  Reply
  1. choosehappy says:
    
    April 24, 2019 at 8:31 am
    
    i would recommend using the imagemagik convert command (described in the comments). have you tried that?
    
    Reply
rashi says:

January 17, 2018 at 8:26 pm

I tried using this code but my output images consist of only black pixels(empty tiff images are being created) I am not able to understand why

Reply
1. choosehappy says:
  
  January 17, 2018 at 8:28 pm
  
  the matlab version? what kind of input image is it? some of the codecs aren’t supported, unforunately
  
  if you scroll up the comments, you’ll find a linux command line version using convert from imagemagick, which may be more robust. they have pretty robust image format support
  
  Reply
  1. rashi says:
    
    January 17, 2018 at 8:52 pm
    
    input image is an svs image..when i run iminfo on it , i get only 4 rows. Also I am able to display the thumbnail version using page 2. Any other page number gives a black image.
    
    Reply
    1. rashi says:
      
      January 17, 2018 at 8:53 pm
      
      Matlab version 2017b
      
      Reply
      1. choosehappy says:
        
        January 17, 2018 at 9:06 pm
        
        it must be an old svs file (or from an older scanner), they use some compression which matlab can never seem to read correctly. if you want to use matlab, i recommend using the openslide library to read the image
Upek says:

May 5, 2018 at 8:31 am

Thank you for the post.
How do I create overlapping blocks?

Reply
1. choosehappy says:
  
  May 5, 2018 at 8:37 am
  
  never really looked into that, seems like it would be more difficult to merge back together? this is an interesting post which may help: https://imagemagick.org/discourse-server/viewtopic.php?t=22942
  ultimately, any approach is going to be 2 nestled for loops, so that may be the way to go directly for customized output
  
  Reply
Suvidha Tripathi says:

May 22, 2018 at 6:57 am

Hi, Can you tell me what and where do I have to make changes if I read my file through openslide. I am trying but, I cannot change the code appropriately.

Reply
1. choosehappy says:
  
  May 23, 2018 at 8:58 am
  
  sorry, i don’t use this code anymore. the imagemagik convert approach discussed above is my preferred method. have you tried it? if you really want to use matlab + openslide, i’d suggest just using 2 for loops and avoiding the PagedTiffAdapter approach
  
  Reply
David says:

August 7, 2018 at 9:53 am

Can you go into further detail on how you used the imagemagick convert approach? Where do we execute this command? What are the inputs/parameters?

Reply
1. choosehappy says:
  
  August 8, 2018 at 6:32 am
  
  not sure what you want to know? i typed the command into a linux terminal and voila 🙂 you can find the documentation here https://www.imagemagick.org/script/command-line-options.php#crop
  
  Reply
  1. Ayman says:
    
    August 16, 2018 at 11:32 am
    
    I see! It seems to be working now. However, for large .svs files, the cropping function takes quiet a long time. Do you know if there is any way to accelerate this process by using the computer’s GPU?
    
    Reply
    1. choosehappy says:
      
      August 17, 2018 at 8:27 am
      
      if you look at the system resources while the conversion is taking place, i think you’ll find that the bottleneck is getting the file off of the hard drive and writing the tiles. there is essentially no “computation” which takes place, just loading and saving, so if anything i imagine adding a GPU to the mix will make things slower instead of faster.
      
      Reply
Muhammad Rakeh Saleem says:

November 27, 2019 at 2:30 pm

I have tried to used this method to split images with 4K resolution and after processing merge them together. However, when I run the split images to merge them, the output .tif file doesn’t show me anything instead it prompts that the format is not supported. I tried to change the RGB Scale value and compression rate too but nothing work in merging the images although the split part is working fine.

Anyone with heads up kindly help. Thanks

Reply
1. choosehappy says:
  
  November 27, 2019 at 5:50 pm
  
  have you tried the imagemagik convert version discussed in the comments?
  
  Reply
JaiXXD says:

December 11, 2019 at 2:46 am

please help me . How can input those commands on aperio. I don` have that much experience in writing scripts but could at least help me on how to input those commnads on aperio. your reply will be greatly delighted.

Reply
1. JaiXXD says:
  
  December 11, 2019 at 2:49 am
  
  I have .svs file by the way.
  
  Reply
2. choosehappy says:
  
  December 11, 2019 at 4:45 pm
  
  i think you’ll need to find someone with some experience to help walk you through this. none of these works are designed to be used through the imageviewer, but instead through programming environments
  
  Reply
Tim says:

August 12, 2020 at 2:43 pm

is there an easy program to reduce file size in svs? mine are coming out as 2-3GB at least per slide

Reply
1. choosehappy says:
  
  August 13, 2020 at 8:14 am
  
  sorry not sure i understand your questions, svs files tend to be about 2GB simply because they contain very large images of e.g., 100k x 100k pixels. if you’re interested in reducing the size you could reprocess them with heavy compression, but you’ll start to see artifacts and drops in overall algorithmic performance as a result. we wrote a paper about that here: https://pubmed.ncbi.nlm.nih.gov/32155093/
  
  Reply