Distributed image processing using Dask

Very brief introduction to Dask

Dask is a flexible parallel computing library for analytic computing.

Image processing using Dask

I am going to show how to perform the processing of images using a distributed system in Dask.

The processing of one image follows the next steps:

  • Load image from disk. Each image contains four subimages or extensions and those can be processed in parallel up to certain point.
  • Each image extension is first preprocessed.
  • We substract a bias value from each value, obtained from a calibration pipeline that runs independently. The bias image contains again 4 extensions.
  • After bias, a similar process called flatfielding further corrects images.
  • We use the corrected image to detect features.
  • All extracted features from the four images are used to compute the registration model between them.
  • The model are then used to register the images.
  • Images with correct registration are then stacked together.

The figure below shows a flow graph of the above pipeline. It is clear that many steps can be done in parallel.

Pipeline graph for the processing of one image.

Pipeline graph for the processing of one image.

Now we can create another pipeline that processes four single images and combines them all the end. This is how it looks:

Pipeline graph for the processing of four images.

Pipeline graph for the processing of four images.

Pipeline graph for the processing of four images with shared calibrations.

Pipeline graph for the processing of four images with shared calibrations.

Time Series

Related