by maciej on May 9, 2016

Easy TensorFlow Model Training on AWS

Recently Google released TensorFlow 0.8 which amongst other features provides distributed computing support. While this is great for power users, the most important step for most people trying to get started with machine learning or deep leaning is simply to have a powerful and pre-configured instance. To solve this problem, we recently released Bitfusion Ubuntu 14 TensorFlow AMI using version 0.8 of TensorFlow which has been configured to work equally well across CPU and GPU AWS instances.

Bitfusion Ubuntu 14 TensorFlow

Launch on AWS!

Our TensorFlow AMI is packaged with the most recent Nvidia Drivers, the Cuda 7.5 Toolkit, and cuDNN4. Additionally we integrate Jupyter Notebook which allows you to experiment with TensorFlow code directly from your browser without ever having to actually log into the AWS instance - more on this in a bit. Finally, we also pre-installed course materials for the Udacity Deep Learning Course so that you can get started with learning TensorFlow quickly.


Launching an AWS Instance

Launching and AWS instance from the AWS Marketplace with our TensorFlow AMI takes three easy steps:


aws create account

1. If you don't already have an AWS account, click here. Then click on "Create and AWS Account" and follow the prompts to create a new account.


tensorflow product page

2. Visit our TensorFlow AMI product page and click the big orange "Continue" button on the top right.


tensorflow one-click page

3. Select the EC2 instance type that you want to spin up.* Then, create a new key pair or select an existing one and launch the instance.**

*You can find a complete list of AWS instance types here. The default instance is set to a g2.8xlarge, since we like GPUs, but our AMI works across all instance types.
**If you have not created an key pair before you can do so here after which return to the product launch page and refresh it.

You are all set - After you click the "Accept Software Terms & Launch with 1-Click" button you will receive a confirmation window with instructions on how to access the instance. Please write them down for future reference and then dismiss the window.

tensorflow one-click popup

Running Your First TensorFlow Notebook with Jupyter

Once you dismiss the launch window you will see a page listing all your AWS software subscriptions. On this page please look for the Bitfusion Ubuntu 14 TensorFlow product and then click on the "Manage in AWS Console" link which will open a new window. If you don't see TensorFlow don't be alarmed, you may need to wait a minute or two and then re-fresh the software subscriptions page.

aws software subscriptions


aws console and instance info


In this new window you can see information about the instance which is running the TensorFlow AMI. Here you can locate two key pieces of information which you will need to access the instance, namely the Instance ID and Public IP.

To access Jupyter simply enter the following URL into your favorite browser:

http://<Public IP>:8888

Next you will be asked for the Password. For this simply enter the Instance ID and press the "Log In" button. You will then be presented with a screen such as the one below where you will see a Udacity folder with all the course materials. To keep things simple we will create a TensorFlow Hello World Notebook from scratch. First, click on the "New" button toward the top right and select "Python 2" under Notebooks.

jupyter login and new notebook


tensorflow hello world


Now, enter the following code into the box next to In [ ]: and then press the Run button at the top of the page. First time around this may take a little bit of time since the TensorFlow module is being imported (you will see a "*" show up in the In [*] indicating that the cell is being evaluated), however, eventually you will see the expected output. Notice that you may seem some DeprecationWarning relating to DisplayFormatter - these can be safely ignored.

import tensorflow as tf
hello = tf.constant('Hello, TensorFlow!')
sess = tf.Session()

Let's try to do some simple math, enter the following code into the next input box and once again press the Run button.

a = tf.constant(10)
b = tf.constant(32)

Wasn't that easy? Now you can go back and use the Udacity Notebooks to follow along with the Udacity Deep Learning Course.


Running TensorFlow from the Command Line

Running TensorFlow from the command line is also a very straight forward process with our AMI. First you need to SSH into the instance which you can do via the following command (make sure your pem file has the proper permissions - after you download it from AWS after creating the key be sure to chmod it to 400):

ssh -i <path to your pem file> ubuntu@<Public IP>

TensorFlow comes with various ready to use models and script. For example you can train a convolution neural network (CNN) using the MNIST dataset using the following command.

python ~/tensorflow/tensorflow/models/image/mnist/

How about a more advanced model that trains a CNN on the CIFAR-10 dataset? Once again a single line command will do the trick:

python ~/tensorflow/tensorflow/models/image/cifar10/

You may not want to wait for the command above to finish as with this CNN and the size of the data set it will take a few hours to do so. By default, the above model trains on a single GPU. Thankfully, the above implementation is capable of running on multiple GPUs. Assuming you started our AMI on a g2.8xlarge instance you can run the same training on all 4 GPUs with the following command:

python ~/tensorflow/tensorflow/models/image/cifar10/ --num_gpus=4

The graph below shows a performance comparison as we scale the number of GPUs from 1 to 4 for the above training run. Generally, the scaling is pretty good for this CNN, although the performance improvement is not quite linear with the number of GPUs, implying that at some point more GPUs will not result in additional performance gains. Keep in mind that the performance scaling depends a lot on the CNN being used, the batch-size, and the data-set, and you should always run your own experiments.

cifar multi gpu performance

Hopefully you are convinced that starting with TensorFlow using our AMI is simple and only takes a few minutes. In an upcoming post we will demonstrate how to improve TensorFlow performance even more by scaling across multiple nodes. In the meantime if you run into any problems or have any question please let us know.

Get Started!


Are you currently developing AI applications, but spending too much time wrangling machines and setting up your infrastructure? We are currently offering a Free 30-Day Trial of Bitfusion Flex!

New Call-to-action

Topics: tensorflow, tutorial


Bitfusion VMware
Solution Guide

Download Guide

Bitfusion Elastic AI Platform

Try FlexDirect Today