Deep Learning Blog | Bitfusion

Quick Comparison of TensorFlow GPU Performance on AWS P2 and G2 Instances

Written by maciej | Nov 3, 2016 4:01:39 PM

TensorFlow GPU performance on AWS p2 instances is between 2x-3x faster when compared to previous generation g2 instances across a variety of convolutional neural networks.

Recently, we made our Bitfusion Deep Learning  AMIs available on the newly announced AWS P2 instances. Naturally, one of the first questions that arises is, how does the performance of the new P2 instances compare to that of the the previous generation G2 instances. In this post we take a quick look at single-GPU performance across a variety of convolutional neural networks. To keep things consistent we start each EC2 instance with the exact same AMI, thus keeping the driver, cuda, cudnn, and framework the same across the instances.

TensorFlow GPU Performance

To evaluate TensorFlow performance we utilized the Bitfusion TensorFlow AMI along with the convnet-benchmark to measure for forward and backward propagation times for some of the more well known convolutional neural networks including AlexNet, Overfeat, VGG, and GoogleNet. Because of the much larger GPU memory of 12 GiB, the P2 instances can accommodate much larger batch sizes than the G2 instances. For the purpose of the benchmarks below, the batch sizes were selected for each network type such that they could run on the G2 as well as on the P2 instances. The Tables below summarize the results obtained for G2 and P2 instances:

 
g2.2xlarge - Nvidia K520
Network Batch Size Forward Pass (ms) Backward Pass (ms) Total Time (ms)
AlexNet 512 502 914 1416
Overfeat 256 1134 2934 4068
VGG 64 750 2550 3300
GoogleNet 128 600 1587 2187
 
 
p2.xlarge - Nvidia K80
Network Batch Size Forward Pass (ms) Backward Pass (ms) Total Time (ms)
AlexNet 512 254 462 716
Overfeat 256 427 847 1274
VGG 64 423 869 1292
GoogleNet 128 341 783 1124
 
 
 
 

Averaging the speedup across all four types of networks, the results show an approximate ~2.42x improvement in performance - not bad for an instance which is only ~1.39 more expensive on an hourly on demand basis.

We have several other Deep learning AMIs available in the AWS Marketplace including Caffe, Chainer, Theano, Torch, and Digits.  If you are interested in seeing GPU Performance benchmarks for any of the above drop us a note.

Are you currently developing AI applications, but spending too much time wrangling machines and setting up your infrastructure? We are currently offering a Free 30-Day Trial of Bitfusion Flex!