This is the second to last blog post in the series, (the first one here, second one here), where we will go into greater detail about how we can use Ray Serve to set up a server waiting to respond to our requests for processing. These last two are the most complex blogpost in the series and require some understanding of how HTTP, REST, and web services work. You can find relevant prereading here.
Ray Serve is a scalable model serving library for building online inference APIs. Serve is framework agnostic, so you can use a single toolkit to serve everything from deep learning models built with frameworks like PyTorch, Tensorflow, and Keras, to Scikit-Learn models, to arbitrary Python business logic.
Although classification metrics are good for summarizing a model’s performance on a dataset, they disconnect the user from the data itself. Similarly, a confusion matrix might tell us that performance is suffering because of false positives, but it obscures information about what patterns may have caused those misclassifications and what typesof false positives there might be.
One way to gain interpretability is to group sampled images by the category of their output (true negative, false negative, false positive, true positive), and display them in a powerpoint file for facile review. These visualizable categories make it easy to identify patterns in misclassified data that can be exploited to improve performance (e.g., hard negative mining, or image analysis based filtering).
This blog post describes and demonstrates a workflow that produces such a powerpoint slide deck automatically for review, as shown below:
Digital pathology projects often require assigning a class to cells/objects. For example, you may have a segmentation of cells/glomeruli/tubules and want to identify the ones which are lymphocytes/sclerotic/distal. This classification process can be done using machine or deep learning classifiers by supplying the object of question and receiving an output score which indicates the likelihood that that particular object is of that particular type.
This blog post will demonstrate an efficient way of using QuPath to help find the ideal likelihood threshold for your classifier.
In digital pathology, input data is often exceedingly too large for DL models to process directly, with Whole Slide Images (WSI) around 100k x 100k pixels. This post provides a quantitative and qualitative method, with code, to help optimize important digital pathology specific hyperparameters: patch size and magnification. Optimizing these variables can decrease training times, lowers hardware requirements, and reduces the amount of data required to effectively train a model.
This is an updated version of the previously described workflow on how to load and classify annotations/detections created in QuPath for usage in downstream machine learning workflows. The original post described how to use the Groovy programming language used by QuPath to export annotations/detections as GeoJSON from within QuPath, made use of a Python script to classify them, and lastly used another Groovy script to reimport them. If you are not familiar with QuPath and/or its annotations you should probably read the original post first to provide better context and understanding of the respective workflows, as well as being able to appreciate the more elegant approach taken here. If you are already using the described approach, you should be able to easily modify it to follow this newer approach.
The manual labeling of large numbers of objects is a frequent occurrence when training deep learning classifiers in the digital histopathology domain. Often this can become extremely tedious and potentially even insurmountable.
To aid people in this annotation process we have developed and released Quick Annotator (QA), a tool which employs a deep learning backend to simultaneously learn and aid the user in the annotation process. A pre-print explaining this tool in more detail is available [here].
Utilization of current GPUs is often limited by the ability to get the data onto and off the device quickly. More precisely, this means taking data from the host RAM, transferring it over the PCI-e bus to the GPU RAM is the bottleneck of many deep learning use cases.
Update-Nov 2020: Code has now been placed in github which enables the reading and writing of compressed geojson files at all stages of the process described below. Compression reduces the file size by approximately 93% : )
QuPath is a digital pathology tool that has become especially popular because it is both easy to use to and supports a large number of different whole slide image (WSI) file formats. QuPath is also able to perform a number of relevant analytical functions with a few mouse clicks. Of interest in this blog post is mentioning that the pathologists we tend to work with are either already familiar with QuPath, or find it easier to learn versus other tools. As a result, QuPath has become a goto tool for us for both the creation, and review of, annotations and outputs created by our algorithms.
Here we introduce a robust method using GeoJSON for exporting annotations (or cell objects) from QuPath, importing them into python as shapely objects, operating upon them, and then re-importing a modified version of them back into QuPath for downstream usage or review. As an example use case we will be looking at computationally identifying lymphocytes in WSIs of melanoma metastases using a deep learning classifier.