Converting an existing image into an Openslide compatible format

Many digital pathology tools (e.g., our quality control tool, HistoQC), employ Openslide, a library for reading whole slide images (WSI).  Openslide provides a reliable abstraction away from a number of proprietary WSI file-formats, such that a single programmatic interface can be employed to access WSI meta and image data.

Unfortunately, when smaller regions of interest, or new images, are created in tif/png/jpg formats they no longer remain compatible with OpenSlide. This blog post discusses how to take any image and convert it into an OpenSlide compatible WSI, with embedded metadata.

We begin with this 2,000 x 2,000 pixel png image:

If we try to open this png image with OpenSlide, it crashes immediately saying that it is an unsupported file type:

osh  = openslide.OpenSlide("17642_500_f00001_original.png")
---------------------------------------------------------------------------
OpenSlideUnsupportedFormatError           Traceback (most recent call last)
<ipython-input-5-bc5d9b026f0f> in <module>
----> 1 osh  = openslide.OpenSlide("17642_500_f00001_original.png")

c:\python37\lib\site-packages\openslide\__init__.py in __init__(self, filename)
    152         AbstractSlide.__init__(self)
    153         self._filename = filename
--> 154         self._osr = lowlevel.open(filename)
    155 
    156     def __repr__(self):

c:\python37\lib\site-packages\openslide\lowlevel.py in _check_open(result, _func, _args)
    172     if result is None:
    173         raise OpenSlideUnsupportedFormatError(
--> 174                 "Unsupported or missing image file")
    175     slide = _OpenSlide(c_void_p(result))
    176     err = get_error(slide)

OpenSlideUnsupportedFormatError: Unsupported or missing image file

If we look at the OpenSlide documentation we can see the minimum specifications needed to produce a compatible tif image, from https://OpenSlide.org/formats/generic-tiff/ :

  1. No other detections succeed.
  2. The file is TIFF.
  3. The initial image is tiled.

Thus we need to take our png image, convert it to a TIF file and ensure that the first image is tiled.

To do this, following this tutorial, we can use two common tools, VIPS or imagemagick. Note VIPS is more readily available for both linux and windows.

Using VIPS

For VIPS, we use the im_vips2tiff command:

C:\research\tools\vips-dev-8.8\bin\vips im_vips2tiff 17642_500_f00001_original.png 17642_500_f00001_original_tiled.tif:jpeg:75,tile:256×256,pyramid

Where “jpeg” specifies to use jpeg compression, and 75 specifies the compression quality

The most important part, tile specifies that we want the image to be tiled (note this is one of the OpenSlide requirements), and then we specify a tile size of 256 x 256.

Lastly, for very large WSIs, we may want to specify “pyramid” so that the final tif has multiple smaller resolutions available within it. Note that this is not a requirement.

Using ImageMagick (convert)

Alternatively, the same process can be performed with imagemagick:

convert 25612_500_f00064_original.tif -define tiff:tile-geometry=256x256 -compress jpeg 'ptif: 17642_500_f00001_original_tiled.tif' 

Where tiff:tile-geometry=256×256 defines a tiled tif image and associated tile size, and “ptif” specifies pyramidal tif.

Readable but metadata-less

After performing this conversation, we can now open the file with OpenSlide and examine its properties:

>>> osh  = openslide.OpenSlide("17642_500_f00001_original_tiled.tif")
>>> osh.properties
       <_PropertyMap {'openslide.level-count': '4', 'openslide.level[0].downsample': '1', 'openslide.level[0].height': '2000', 'openslide.level[0].tile-height': '256', 'openslide.level[0].tile-width': '256', 'openslide.level[0].width': '2000', 'openslide.level[1].downsample': '2', 'openslide.level[1].height': '1000', 'openslide.level[1].tile-height': '256', 'openslide.level[1].tile-width': '256', 'openslide.level[1].width': '1000', 'openslide.level[2].downsample': '4', 'openslide.level[2].height': '500', 'openslide.level[2].tile-height': '256', 'openslide.level[2].tile-width': '256', 'openslide.level[2].width': '500', 'openslide.level[3].downsample': '8', 'openslide.level[3].height': '250', 'openslide.level[3].tile-height': '256', 'openslide.level[3].tile-width': '256', 'openslide.level[3].width': '250', 'openslide.quickhash-1': '47f0c707de51f93560f6378ab9a7b6fd92cd5a43bec6ef2c6bb1f48d553f4091', 'openslide.vendor': 'generic-tiff', 'tiff.ResolutionUnit': 'centimeter', 'tiff.XResolution': '28.350000477234143', 'tiff.YResolution': '28.350000477234143'}>                

We can see that the chosen reader (openslide.vendor) is called generic-tiff, which matches our expectation from the documentation, almost there!

Adding Metadata

If we look closer, however, we can see many pieces of metadata which tools (such as HistoQC) require are missing, in particular the apparent magnification, microns per pixel, etc. We note that the generic-tiff reader does not natively support this information, and thus we need to spoof another format which has this information populated.

For our purposes, we will use Aperio SVS, which according to the OpenSlide documentation has these properties (https://OpenSlide.org/formats/aperio/):

  1. The file is TIFF.
  2. The initial image is tiled.
  3. The ImageDescription tag starts with Aperio.

We can see the only main difference, in terms of OpenSlide, between the two-file format is that Aperio has the additional constraint of requiring “ImageDescription” to be set.

We can explicitly set ImageDescription using either tifftools (https://pypi.org/project/tifftools/) or tiffset (from libtools, http://www.libtiff.org/tools.html), during which we can also specify our magnification and other Aperio metadata.

Note that because files are ALWAYS rewritten, tifftools is slower than libtiff’s tiffset and most EXIF tools, but since tifftools is in pure python, it is more easily-cross platform compatible.

Using tifftools dump to examine an existing Aperio svs file, we can see what the ImageDescription is:

 "Aperio Digital Slide 126976x94208 [0,0 126976x92416] (256x256) JPEG/RGB Q=80|AppMag = 40|MPP = 0.23|Date = 28/02/2019|Time = 14:06:50"  

Where we can see “AppMag” for apparent magnification, MPP for the microns per pixel and some other associated metadata.

There is quite a bit of flexibility, and we can modify the information as we see fit, even opting to include only a small subset. Here as a basic minimal working example, we will set the magnification and MPP, both important for HistoQC usage:

tifftools set -y -s ImageDescription  "Aperio Fake |AppMag = 40|MPP = 0.23" 17642_500_f00001_original_tiled.tif 

Note here, that I specify “Fake” to indicate to future users of the file, that this metadata was artificially imparted on the file and is not the original metadata provided by the scanner.

Lastly when we use OpenSlide, we can now see that all of our metadata is appropriately set:

>>> osh  = openslide.OpenSlide("17642_500_f00001_original_tiled_meta.tif")
 osh.properties
 <_PropertyMap {'aperio.AppMag': '40', 'aperio.MPP': '0.23', 'openslide.comment': 'Aperio Fake |AppMag = 40|MPP = 0.23', 'openslide.level-count': '4', 'openslide.level[0].downsample': '1', 'openslide.level[0].height': '2000', 'openslide.level[0].tile-height': '256', 'openslide.level[0].tile-width': '256', 'openslide.level[0].width': '2000', 'openslide.level[1].downsample': '2', 'openslide.level[1].height': '1000', 'openslide.level[1].tile-height': '256', 'openslide.level[1].tile-width': '256', 'openslide.level[1].width': '1000', 'openslide.level[2].downsample': '4', 'openslide.level[2].height': '500', 'openslide.level[2].tile-height': '256', 'openslide.level[2].tile-width': '256', 'openslide.level[2].width': '500', 'openslide.level[3].downsample': '8', 'openslide.level[3].height': '250', 'openslide.level[3].tile-height': '256', 'openslide.level[3].tile-width': '256', 'openslide.level[3].width': '250', 'openslide.mpp-x': '0.23000000000000001', 'openslide.mpp-y': '0.23000000000000001', 'openslide.objective-power': '40', 'openslide.quickhash-1': '05d86bf08deb3a9bfa5d2dd9e0203d73f38bfb567159748afda09a6976aabc95', 'openslide.vendor': 'aperio', 'tiff.ImageDescription': 'Aperio Fake |AppMag = 40|MPP = 0.23', 'tiff.ResolutionUnit': 'centimeter', 'tiff.XResolution': '28.350000477234143', 'tiff.YResolution': '28.350000477234143'}>
>>> osh.properties['openslide.objective-power']
 '40'

And as well we can see our exact comment is recapitulated in the comment section:

>>> osh.properties['openslide.comment']
 'Aperio Fake |AppMag = 40|MPP = 0.23'

Thankfully this now allows our HistoQC pipeline to successfully analyze the file, though, importantly, given that this isn’t a WSI, ROI specific parameters are likely needed:

Thats it! Best of luck!

30 thoughts on “Converting an existing image into an Openslide compatible format”

  1. Computational power: If one of the desired benefits of going digital is the ability to employ sophisticated machine learn­ing algorithms, computational power may be another critical issue to consider. While image analysis is meant to enhance the routine work of pathologists, if the time taken for computational evaluation exceeds anticipated turn-around-time, the overall value proposition becomes weaker. As such, sufficient high-performance computing resources should be available based on expected throughput. Although digital slide images can be used for both clinical diagnostics and research, the systems supporting each workflow are often not compatible. Research typically requires greater openness, such as the ability to manipulate image data directly by executing scripts in third-party software. After validation, these algorithms may by integrated into a diagnostic IMS.

  2. Hi Andrew,

    Actually at our lab we’ve been working on a similar “Aperio Mimic” format, and I thought I’d share some of what we’ve found on how to extend this to other software. The software we’ve been testing against are: Concentriq, GIMP, HALO, Qupath, Aperio ImageScope, openslide and libtiff.

    Our original version used VIPS to extract the image, but IIRC that doesn’t support CZI and in any case for broad input format support you really need bioformats, which will automatically extract the image from the correct directory in the input image and convert it to a standard form. We use bioformats’ “bfconvert” to save any of over 100 formats as a standard TIFF with no metadata.

    We also run bioformats’ “showinf” to generate an OME-XML document which allows us to extract the resolution and magnification info (the lowest level tags you need are “physicalsizexunit”, “physicalsizex”, and “nominalmagnification”- the rest of the tree is left as an exercise for the reader 🙂 ).

    We then have to do a second copy using libtiff’s “tiffcp” to set the PlanarConfiguration as “contiguous”. Aperio uses this planar config and some software takes it for granted, even though it’s technically not in the Aperio specification.

    This second copy could possibly be avoided if you made a small change to the bioformats library and recompiled, described here: https://github.com/ome/bioformats/issues/3628 , although we have not tested this.

    Following all that we use a similar process to yours to add the metadata to the final mimic tiff based on the Openslide Aperio format description. One little gotcha is that Openslide just needs the first word in ImageDesc to be “Aperio”, but HALO and qupath need it to say “Aperio Image Library ” 🙂 We use v12.0.15 as our version number- no idea if this matters.

    Openslide is easier than some of the other software. It doesn’t need a thumbnail or a pyramid (qupath does), it doesn’t need a specific planar configuration (HALO does- my notes say GIMP does too, although I can’t imagine why), and it doesn’t need a label (some of our use cases do).

    We’re still working on finalizing this format but we’re most of the way there. I hope to release it on my github page: https://github.com/markemus?tab=repositories once it’s ready assuming I can get permission, but no promises.

    Hopefully this helps save someone else dozens of hours!

  3. Hi Andrew,

    Another great post! I have a quick question. As in your example, AppMag and MPP are sometime inconsistent; ideally it should be “AppMag = 40|MPP = 0.25” but in the wild we find a slide with “AppMag = 40|MPP = 0.23”, “AppMag = 40|MPP = 0.18” or “AppMag = 40|MPP = 0.11”. Do you know what causes these inconsistency, and how should be treat such slides if it is not caused an error, specifically to extract tissues/patches in the same magnification.

    1. Great question! If you look at the notes in the readme file, this gives a hint: “the new Pannoramic 1000 scanner, objective-magnification is given as 20, when a 20x objective lense and a 2x aperture boost is used, i.e. image magnification is actually 40x. While their own CaseViewer somehow determines that a boost exists and ends up with 40x when objective-magnification in Slidedat.ini is at 20, openslide and bioformats give 20x.” So it seems some scanners only take into account one of the lenses if there are multiple ones present, causing some grief. Generally speaking the MPP seems to be consistently more accurate than that apparent magnification, so that may be a good piece of metadata to focus on. to help figure this out, you can use the most basic of histoqc pipelines which only extracts the metadata. this will be very quick, and also gives a high-level perspective of the heterogeneity in the data

      1. I never thought about the multi-lens condition but that totally makes sense. Thank you so much for your detailed explanation and the suggestion!

  4. Hi Andrew,
    Thank you for the exciting post. I tried to convert .czi to. tif but I noticed that .czi is not supported by VIPS and ImageMagick. For instance, the message when using VIPS on windows as:

    VipsForeignLoad: “input_image.czi” is not a known file format.

    How to convert .czi to .tif?

    1. Great question, but unfortunately I don’t know the answer. I’ve not personally used a CZI image in a long time, I think if I remember correctly, the software application that comes with the scanner allows for exporting as tif? Can you try that?

  5. Hi Andrew,
    Thank you so mu much for this excellent post,
    this helps me a lot!
    I have a question about the downscaling “compression” issue,
    I searched for lots of posts:
    the methods that were used for downscaling SVS files always used “JPEG”, I couldn’t find any post about explaining this.
    Can you give me some advice?

    Thank you so much!
    Tiphanie

    1. Well, without compression of SVS files, the file sizes would be too large to reasonably transfer, and one likely doesn’t need a perfect fidelity image. as a result, most scanners that i know of don’t scan into a compressionless format, and instead use jpeg @ ~80% quality, which coincides with a nice reduction in overall filesize while not very noticeably negatively impacting visual presentation. that said, once we know that the original image is compressed with jpeg, we need to be careful that we don’t “mix and match” compression algorithms, as the intersection of the types of artifacts that they produce may potentially create very unexpected results. as such, i think most folks stay with jpeg compression. does that answer your question? we also have a paper on the effects of compression that you can find here: https://pubmed.ncbi.nlm.nih.gov/32155093/

      1. Thank you for taking the time for the detailed explanation!
        I finally have some direction about this part!
        I’ll read your paper about the effects of compression recently. 🙂
        Very happy to read your informative blog, the posts about downloading TCGA data also helped me a lot in my research! I’m a big fan of yours from Taiwan. ^^

        Sincerely,
        Tiphanie

  6. Hi Andrew, i love this post. But I have a question on the same lines. I have some DICOM files with metadata on it. I want to convert them to a files that will be readable by openslide library. My approach is to convert the dicom file to a tiff file and embed metadata onto it while. I managed to embed the metadata on the tiff file but it is not viewable from openslide so i think i will follow your tutorial. Am i on the right track ?
    Thanks I appreciate this post

    So far I tried to embed the metad

      1. Hey Andrew, i could do it. I have another thing that I wanted to try out.
        I guess the answer is probably NO, but is it possible to create a .SVS file by changing the file format of a tif file. I really want to create a .SVS file but I cant find any resource for that and most resources say is that it is impossible. Do you have any comments on that ?
        Thanks! i appreciate your guidance.

        1. okay! well, you should be able to follow this blog post to generate a sufficiently convincing svs file that openslide will open. good luck!

          1. you said “this procedure does generate svs formatted files.”,i want to know which procedure generate svs formatted files,thanks very much.

  7. Hi Andrew, is there any way to change the only the AppMag of the tif file through tifftools. I can see that u change the image description and add AppMag in that but is there any way to keep the image description unchanged but only change the AppMag.
    Im trying to write a script for that.
    Thanks Andrew!

    1. Sure, i don’t see why not? I would treat it like a string, and use “get” first to pull in the original one, modify it as needed, and then set it?

  8. Hi Andrew,

    This was very helpful. Interestingly, I managed to do whole process in one shot via pyvips:

    image = pyvips.pyvips.Image.new_from_file(…)
    image = image.copy()
    image.set_type(pyvips.GValue.gstr_type, ‘image-description’, ‘Aperio Image Library v12.0.15|AppMag = 20|MPP = 0.5’)
    image.write_to_file(…)

    Notes:
    * May need to use set instead of set_type if ImageDescription already exists in original image.
    * Openslide recognized the resulting .tif file as “aperio”, but I needed to change the extension to .svs to be recognized by cucim.

Leave a Reply

Your email address will not be published. Required fields are marked *