Converting an existing image into an Openslide compatible format

Many digital pathology tools (e.g., our quality control tool, HistoQC), employ Openslide, a library for reading whole slide images (WSI).  Openslide provides a reliable abstraction away from a number of proprietary WSI file-formats, such that a single programmatic interface can be employed to access WSI meta and image data.

Unfortunately, when smaller regions of interest, or new images, are created in tif/png/jpg formats they no longer remain compatible with OpenSlide. This blog post discusses how to take any image and convert it into an OpenSlide compatible WSI, with embedded metadata.

We begin with this 2,000 x 2,000 pixel png image:

If we try to open this png image with OpenSlide, it crashes immediately saying that it is an unsupported file type:

osh  = openslide.OpenSlide("17642_500_f00001_original.png")
---------------------------------------------------------------------------
OpenSlideUnsupportedFormatError           Traceback (most recent call last)
<ipython-input-5-bc5d9b026f0f> in <module>
----> 1 osh  = openslide.OpenSlide("17642_500_f00001_original.png")

c:\python37\lib\site-packages\openslide\__init__.py in __init__(self, filename)
    152         AbstractSlide.__init__(self)
    153         self._filename = filename
--> 154         self._osr = lowlevel.open(filename)
    155 
    156     def __repr__(self):

c:\python37\lib\site-packages\openslide\lowlevel.py in _check_open(result, _func, _args)
    172     if result is None:
    173         raise OpenSlideUnsupportedFormatError(
--> 174                 "Unsupported or missing image file")
    175     slide = _OpenSlide(c_void_p(result))
    176     err = get_error(slide)

OpenSlideUnsupportedFormatError: Unsupported or missing image file

If we look at the OpenSlide documentation we can see the minimum specifications needed to produce a compatible tif image, from https://OpenSlide.org/formats/generic-tiff/ :

  1. No other detections succeed.
  2. The file is TIFF.
  3. The initial image is tiled.

Thus we need to take our png image, convert it to a TIF file and ensure that the first image is tiled.

To do this, following this tutorial, we can use two common tools, VIPS or imagemagick. Note VIPS is more readily available for both linux and windows.

Using VIPS

For VIPS, we use the im_vips2tiff command:

C:\research\tools\vips-dev-8.8\bin\vips im_vips2tiff 17642_500_f00001_original.png 17642_500_f00001_original_tiled.tif:jpeg:75,tile:256×256,pyramid

Where “jpeg” specifies to use jpeg compression, and 75 specifies the compression quality

The most important part, tile specifies that we want the image to be tiled (note this is one of the OpenSlide requirements), and then we specify a tile size of 256 x 256.

Lastly, for very large WSIs, we may want to specify “pyramid” so that the final tif has multiple smaller resolutions available within it. Note that this is not a requirement.

Using ImageMagick (convert)

Alternatively, the same process can be performed with imagemagick:

convert 25612_500_f00064_original.tif -define tiff:tile-geometry=256x256 -compress jpeg 'ptif: 17642_500_f00001_original_tiled.tif' 

Where tiff:tile-geometry=256×256 defines a tiled tif image and associated tile size, and “ptif” specifies pyramidal tif.

Readable but metadata-less

After performing this conversation, we can now open the file with OpenSlide and examine its properties:

>>> osh  = openslide.OpenSlide("17642_500_f00001_original_tiled.tif")
>>> osh.properties
       <_PropertyMap {'openslide.level-count': '4', 'openslide.level[0].downsample': '1', 'openslide.level[0].height': '2000', 'openslide.level[0].tile-height': '256', 'openslide.level[0].tile-width': '256', 'openslide.level[0].width': '2000', 'openslide.level[1].downsample': '2', 'openslide.level[1].height': '1000', 'openslide.level[1].tile-height': '256', 'openslide.level[1].tile-width': '256', 'openslide.level[1].width': '1000', 'openslide.level[2].downsample': '4', 'openslide.level[2].height': '500', 'openslide.level[2].tile-height': '256', 'openslide.level[2].tile-width': '256', 'openslide.level[2].width': '500', 'openslide.level[3].downsample': '8', 'openslide.level[3].height': '250', 'openslide.level[3].tile-height': '256', 'openslide.level[3].tile-width': '256', 'openslide.level[3].width': '250', 'openslide.quickhash-1': '47f0c707de51f93560f6378ab9a7b6fd92cd5a43bec6ef2c6bb1f48d553f4091', 'openslide.vendor': 'generic-tiff', 'tiff.ResolutionUnit': 'centimeter', 'tiff.XResolution': '28.350000477234143', 'tiff.YResolution': '28.350000477234143'}>                

We can see that the chosen reader (openslide.vendor) is called generic-tiff, which matches our expectation from the documentation, almost there!

Adding Metadata

If we look closer, however, we can see many pieces of metadata which tools (such as HistoQC) require are missing, in particular the apparent magnification, microns per pixel, etc. We note that the generic-tiff reader does not natively support this information, and thus we need to spoof another format which has this information populated.

For our purposes, we will use Aperio SVS, which according to the OpenSlide documentation has these properties (https://OpenSlide.org/formats/aperio/):

  1. The file is TIFF.
  2. The initial image is tiled.
  3. The ImageDescription tag starts with Aperio.

We can see the only main difference, in terms of OpenSlide, between the two-file format is that Aperio has the additional constraint of requiring “ImageDescription” to be set.

We can explicitly set ImageDescription using either tifftools (https://pypi.org/project/tifftools/) or tiffset (from libtools, http://www.libtiff.org/tools.html), during which we can also specify our magnification and other Aperio metadata.

Note that because files are ALWAYS rewritten, tifftools is slower than libtiff’s tiffset and most EXIF tools, but since tifftools is in pure python, it is more easily-cross platform compatible.

Using tifftools dump to examine an existing Aperio svs file, we can see what the ImageDescription is:

 "Aperio Digital Slide 126976x94208 [0,0 126976x92416] (256x256) JPEG/RGB Q=80|AppMag = 40|MPP = 0.23|Date = 28/02/2019|Time = 14:06:50"  

Where we can see “AppMag” for apparent magnification, MPP for the microns per pixel and some other associated metadata.

There is quite a bit of flexibility, and we can modify the information as we see fit, even opting to include only a small subset. Here as a basic minimal working example, we will set the magnification and MPP, both important for HistoQC usage:

tifftools set -y -s ImageDescription  "Aperio Fake |AppMag = 40|MPP = 0.23" 17642_500_f00001_original_tiled.tif 

Note here, that I specify “Fake” to indicate to future users of the file, that this metadata was artificially imparted on the file and is not the original metadata provided by the scanner.

Lastly when we use OpenSlide, we can now see that all of our metadata is appropriately set:

>>> osh  = openslide.OpenSlide("17642_500_f00001_original_tiled_meta.tif")
 osh.properties
 <_PropertyMap {'aperio.AppMag': '40', 'aperio.MPP': '0.23', 'openslide.comment': 'Aperio Fake |AppMag = 40|MPP = 0.23', 'openslide.level-count': '4', 'openslide.level[0].downsample': '1', 'openslide.level[0].height': '2000', 'openslide.level[0].tile-height': '256', 'openslide.level[0].tile-width': '256', 'openslide.level[0].width': '2000', 'openslide.level[1].downsample': '2', 'openslide.level[1].height': '1000', 'openslide.level[1].tile-height': '256', 'openslide.level[1].tile-width': '256', 'openslide.level[1].width': '1000', 'openslide.level[2].downsample': '4', 'openslide.level[2].height': '500', 'openslide.level[2].tile-height': '256', 'openslide.level[2].tile-width': '256', 'openslide.level[2].width': '500', 'openslide.level[3].downsample': '8', 'openslide.level[3].height': '250', 'openslide.level[3].tile-height': '256', 'openslide.level[3].tile-width': '256', 'openslide.level[3].width': '250', 'openslide.mpp-x': '0.23000000000000001', 'openslide.mpp-y': '0.23000000000000001', 'openslide.objective-power': '40', 'openslide.quickhash-1': '05d86bf08deb3a9bfa5d2dd9e0203d73f38bfb567159748afda09a6976aabc95', 'openslide.vendor': 'aperio', 'tiff.ImageDescription': 'Aperio Fake |AppMag = 40|MPP = 0.23', 'tiff.ResolutionUnit': 'centimeter', 'tiff.XResolution': '28.350000477234143', 'tiff.YResolution': '28.350000477234143'}>
>>> osh.properties['openslide.objective-power']
 '40'

And as well we can see our exact comment is recapitulated in the comment section:

>>> osh.properties['openslide.comment']
 'Aperio Fake |AppMag = 40|MPP = 0.23'

Thankfully this now allows our HistoQC pipeline to successfully analyze the file, though, importantly, given that this isn’t a WSI, ROI specific parameters are likely needed:

Thats it! Best of luck!

5 thoughts on “Converting an existing image into an Openslide compatible format”

  1. Computational power: If one of the desired benefits of going digital is the ability to employ sophisticated machine learn­ing algorithms, computational power may be another critical issue to consider. While image analysis is meant to enhance the routine work of pathologists, if the time taken for computational evaluation exceeds anticipated turn-around-time, the overall value proposition becomes weaker. As such, sufficient high-performance computing resources should be available based on expected throughput. Although digital slide images can be used for both clinical diagnostics and research, the systems supporting each workflow are often not compatible. Research typically requires greater openness, such as the ability to manipulate image data directly by executing scripts in third-party software. After validation, these algorithms may by integrated into a diagnostic IMS.

  2. Hi Andrew,

    Actually at our lab we’ve been working on a similar “Aperio Mimic” format, and I thought I’d share some of what we’ve found on how to extend this to other software. The software we’ve been testing against are: Concentriq, GIMP, HALO, Qupath, Aperio ImageScope, openslide and libtiff.

    Our original version used VIPS to extract the image, but IIRC that doesn’t support CZI and in any case for broad input format support you really need bioformats, which will automatically extract the image from the correct directory in the input image and convert it to a standard form. We use bioformats’ “bfconvert” to save any of over 100 formats as a standard TIFF with no metadata.

    We also run bioformats’ “showinf” to generate an OME-XML document which allows us to extract the resolution and magnification info (the lowest level tags you need are “physicalsizexunit”, “physicalsizex”, and “nominalmagnification”- the rest of the tree is left as an exercise for the reader 🙂 ).

    We then have to do a second copy using libtiff’s “tiffcp” to set the PlanarConfiguration as “contiguous”. Aperio uses this planar config and some software takes it for granted, even though it’s technically not in the Aperio specification.

    This second copy could possibly be avoided if you made a small change to the bioformats library and recompiled, described here: https://github.com/ome/bioformats/issues/3628 , although we have not tested this.

    Following all that we use a similar process to yours to add the metadata to the final mimic tiff based on the Openslide Aperio format description. One little gotcha is that Openslide just needs the first word in ImageDesc to be “Aperio”, but HALO and qupath need it to say “Aperio Image Library ” 🙂 We use v12.0.15 as our version number- no idea if this matters.

    Openslide is easier than some of the other software. It doesn’t need a thumbnail or a pyramid (qupath does), it doesn’t need a specific planar configuration (HALO does- my notes say GIMP does too, although I can’t imagine why), and it doesn’t need a label (some of our use cases do).

    We’re still working on finalizing this format but we’re most of the way there. I hope to release it on my github page: https://github.com/markemus?tab=repositories once it’s ready assuming I can get permission, but no promises.

    Hopefully this helps save someone else dozens of hours!

Leave a Reply

Your email address will not be published. Required fields are marked *