This year has already proven to be a game-changer in the next generation performance for artificial intelligence, as the race continues to solve the challenges of scalable deep learning. Most popular deep learning frameworks today can scale to multiple GPUs in a server, but its difficult with multiple servers with GPUs.
This is challenge in particular is where Mellanox has been the clear leader as the only interconnect solution able to deliver the needed performance and offload capabilities to unlock the power of scalable AI.
IBM Research just announced their amazing achievement in unprecedented performance and close to ideal scaling with new distributed deep learning software which achieved record communication overhead and 95% scaling efficiency on the Caffe deep learning framework with Mellanox InfiniBand and over 256 NVIDIA GPUs in 64 IBM Power systems.
With the IBM DDL (Distributed Deep Learning) library, it took just 7 hours to train ImageNet-22K using ResNet-101. From 16 days down to 7 hours changes the workflow of data scientists.
You can read more at the IBM blogs : https://www.ibm.com/blogs/research/2017/08/distributed-deep-learning/ and https://www.ibm.com/blogs/systems/scaling-tensorflow-and-caffe-to-256-gpus/
And download the whitepaper here : https://arxiv.org/abs/1708.02188
A technical preview of this IBM Research Distributed Deep Learning code is available today in IBM PowerAI 4.0 distribution for TensorFlow and Caffe.