Real time Data Augmentation using Nvidia Digits + Python Layer

One of the common ways of increasing the size of a training set is to augment the original data with a set of modified patches. These modifications often include (a) rotations, (b) mirroring, (c) lighting adjustment, (d) affine transformations (sheering, etc), (e) magnification modification, (f) addition of noise, etc. This blog post discusses how to do the most trivial modification, rotation, in real-time using a python layer through Nvidia Digits. Given this code, it should be easy to add on other desired augmentations.

I was rather surprised to not find a working example of rotational data augmentation for Caffe and thus am presenting this code here for usage. This post was highly inspired by the Nvidia Digits tutorial on using a python layer available here.


By doing real-time data augmentation there are a number of additional benefits that we can experience.

First off, instead of having to write N * R patches to a database (where N is the number of patches and R is the number of rotations being considered), we can simply write N. Immediately we can see that this will provide an overall write-up speedup of R.

Secondly, when we consider that the database size is now 1/R of the original size (which contained augmentations) it is possible to fit larger databases into memory, drastically improving learning speed. The LMDB backend uses operating system based caching, so as pages are read from disk they are loaded into available memory, but when the operating system detects either low memory or that a new page should replace an old one, the previous page is ejected. Normally this is a beneficial approach as overtime the most used items tend to reside in cache drastically reducing the calls to disk, but in this case since we sequentially iterate through a very large database, the pages are constantly being injected and then ejected with no speedup. Any shrinkage in database size, where the entire db file can fit in memory, will allow for all of the pages to be cached, bringing disk reads down to zero, a substantial efficiency improvement.

Lastly, by writing the database once, often created by taking random patches, it creates a snapshot of a training set which can easily be bench-marked against other augmentation and training approaches. For example, instead of having to re-create a database with the noise/rotation characteristics desired, we can simply implement them in a real-time layer, allowing for the data to be augmented before feeding into the system, but leaving the original data untouched.

The one major caveat to this is the complexity of the augmentation layer. Special attention needs to be paid to the implementation as the code will be called millions of times, so any small inefficiencies will result in massive compounding time penalties. In many cases, this time penalty is greater than the penalty of calling the patches off the disk. For rotation, though, of 90 degrees the complexity is very small as Numpy has a highly optimized function specifically for this purpose which modifies the indexing of the matrix as opposed to performing any complex computations (as would be required with non-90 factor degree based rotations).



There are two aspects which need to be implemented. First, we need to write the python layer and second we need to include the python layer into the network architecture. The python layer is available here and is mostly boilerplate, so we’ll discuss the unique parts. First the important part is the main for loop:

  1. for ii in xrange(0, top[0].data.shape[0]):
  2.         imin = top[0].data[ii, :, :, :].transpose(1, 2, 0)
  3.         top[0].data[ii, :, :, :] = doAugmenttion(imin).transpose(2, 0, 1)

We can see that we need to iterate over each patch in the batch, and that the batch shape is N x K x H x W, where N is the number of patches, K is the number of channels, H is the height and W is the width.

The first thing we do is extract the patch and perform a transpose. This transpose reshapes the data so that it is in H x W x K order from K x H x W order. We then perform the rotation using the doAugmentation function (in this case a rotation) and then finally transpose the data back to the original order before placing it back into the outgoing data blog.

One very important piece of information here, depending on how you setup the database (for example using Caffe’s inbuilt create database tools, or Digits database tools), the order of the color channels may not be in RGB but instead in BGR (which is the Caffe default). This is extremely important to take into account when doing any color based types of augmentation (lighting, deconvolution, etc), but in the case of rotation does not play any role (which is why it is absent here).

Should one need to do the color modifications, the code would look something like this (untested), which swaps the R and B channel before performing the augmentation and then again afterwards:

  1. for ii in xrange(0, top[0].data.shape[0]):
  2.     imin = top[0].data[ii, :, :, :].transpose(1, 2, 0) #change to H x W x K
  3.     imin = imin[:, :, (2, 1, 0)] #change from BGR to RGB
  4.     top[0].data[ii, :, :, :] = doAugmenttion(imin)[:, :, (2, 1, 0)].transpose(2, 0, 1)


Afterwards, we need to include the layer into the architecture. In particular, we only want it to be included during the training stage and not the validation or testing stage, so we can indicate that below using the include directive. Usually this is the best approach, as we want to train on “complex” augmented data, but test on real-world data. Notice that we install this layer right below the data definition layer since this is the first thing we would like to happen to the database before the rest of the network operates:

  1. layer {
  2.   name: "rotation_layer"
  3.   type: "Python"
  4.   bottom: "data"
  5.   top: "data"
  6.   include {
  7.   phase: TRAIN
  8.   }
  9.   python_param {
  10.     module: "digits_python_layers"
  11.     layer: "RotationLayer"
  12.   }
  13. }

Note that the class name and the layer name match.


From here, the digits tutorial can be followed. But the main idea is just adding in the python layer:



To test the python layer, I make sure to include it in all phases (i.e., remove the include directive), so that it becomes possible to see the applied augmentation in Nvidia digits:



Here we can see a simple test wherein I inputted a patch containing the number 5 (in blue) and then the data layer output (red circle) shows that the rotation has in fact taken place. Since the rotation is randomly performed, if we submit the same patch multiple times, we can see it being rotated either 0, 90, 180, or 270 degrees.

As a final comment, note that additional augmentations can of course be added inside of the doAugmentations function.

8 thoughts on “Real time Data Augmentation using Nvidia Digits + Python Layer”

  1. I tried to follow your tutorial, but DIGITS stop with error code -6. I cannot find anything helpful on how to solve the problem online. Do you have any clue on the reasons of the error?

    DIGITS version: 5.1-dev

    1. in fact no, but keep in mind that caffe internally stores the data in batch x color x height x width. as well the color channels are *not* RGB, but BGR. so any technique needs to keep that in mind. i was expecting to build more into this code, so i “undid” the transformations, but its not strictly needed. there is a different more efficient version available here that i haven’t written up yet

      1. Thanks! It is very useful. I tried to apply some augmentation technique using Skimage(Python 3+ version) but my Caffe is installed with Python2.7 so it was causing some issues.

        1. yes, i find that trying to work with 2 versions of python pretty frustrating. i usually end up doing everything with 2.7 since i have so much code already developed for that version the sunk cost in upgrading isn’t worth it (yet)

Leave a Reply

Your email address will not be published. Required fields are marked *