This post is the first of three part series on distributed training of neural networks.
In Part 1, we’ll look at how the training of deep learning models can be significantly accelerated with distributed computing on GPUs, as well as discuss some of the challenges and examine current research on the topic. We’ll also consider when distributed training of neural networks is - and isn’t - appropriate for particular use cases.
In Part 2, we’ll take hands-on look into Deeplearning4j’s implementation of network training on Apache Spark, and provide an end-to-end example of how to perform training in practice.
Finally, in Part 3 we’ll peak under the hood of Deeplearning4j’s Spark implementation, and discuss some of the performance and design challenges involved with maximizing training performance with Apache Spark. We’ll also look at how Spark interacts with the native high-performance computing libraries and off-heap memory management that Deeplearning4j utilizes.