Image segmentation in the cloud

Motivation and purpose

Fitting complex machine learning models to extensive data sets is a computationally demanding task. We decided to explore the possibilities of cloud computation in different platforms and try some conceptually simple but technically challenging use cases. As one example, we utilised a deep-learning convolutional neural network for recognising buildings from aerial imagery. Our goal was to train a model — as well as ourselves — to understand how patterns can be recognised from images in general and how we can implement this in a relatively short time in cloud infrastructure.

Technical punchlines

We used an open-source toolchain written in Python called RoboSat. It enables data manipulation, actual model training and prediction parts, as well as post-processing the results. The modelling machinery was implemented with the python library PyTorch and is formally a U-Net convolutional neural network, developed for image segmentation and described in detail in Ronneberger et al. (2015).

Cloud infrastructure was used with the motivation of harnessing GPU-instances. Our partners at Oracle happened to provide Siili with credits to their recently refurbished Oracle Cloud Infrastructure (OCI), so it was convenient to build our pipeline there. Also, Oracle Cloud Day 2018 was held in late November, so we got a chance to show the usage of their platform for machine learning.

Our original plan was to deploy a Kubernetes cluster with GPU-instances, but provisioning these resources took too long. Hence, we decided to use one bare metal machine instead. It provides nice-enough computing power and low latency distributed from Frankfurt. The instance in detail is a BM.GPU2.2 with 28 CPU-cores, 192GB of RAM and two NVIDIA Tesla P100 GPU-cards. Configuring the machine took some time, since harnessing the power of the GPUs meant that all drivers and proprietary software had to be installed manually. Also retrieving the data and processing it took a lot of time and effort. There were readily available tools for this but not for the whole workflow. Also the ones at hand needed customization. The actual model training was rather quick, thanks to the fast implementation of the modelling algorithm and computational power of the cloud infra.

Outcome & conclusions

After the data processing part training the model and producing predictions was rather straightforward. For production, the hyperparameters of the models should be tuned and the predictions would need post-processing, but we were content with the raw results we were able to get with a moderate computation time.

As seen from the video there were some inaccuracies in the results. Some of the inaccuracies would be rather easy to fix with more extensive post-processing. Also, so called hard negative mining could reduce false positive predictions, e.g. predicting houses on top of roads. Unsurprisingly, most of the time went to configuration and data processing. Now, after establishing the infra and the modelling pipeline, developing and fine-tuning the model would be more straight forward. The RoboSat toolchain was in principle well documented, but applying the tools in practice required quite a bit of handwork.

Authors: Olli Ritari & Anna Norberg