Deep Learning Blog | Bitfusion

Bitfusion Boost 16 GPU Caffe Cluster

Written by maciej | Mar 11, 2016 10:32:00 PM

We recently published our Boost AMIs to the AWS market and walked through potential cluster configurations. Today, we are going to expand on that and set up a Bitfusion Boost cluster on AWS. We will be explicitly setting this up for the Caffe Deep Learning Framework. At the end of this tutorial, you will have a cluster comprised of:

  • One g2.8xlarge as a client where the application runs
  • Three g2.8xlarges as servers

This configuration will give your application a total of 16GPUs!

 

1. Subscribe to the Bitfusion AMIs

This walkthrough leverages AWS’s CloudFormation (CFN) templates. Using our template will enable you to get a Bitfusion Boost cluster up and running in minutes. In order to utilize the CloudFormation template, you need to be signed into the AWS console and be an active subscriber to the AMIs used in the template.

What does it mean to Subscribe to a Product?

Subscribing to a product means that you have accepted the terms of the product as shown on the product’s listing page, including pricing terms and the software seller’s End User License Agreement, and that you agree to use such product in accordance with the AWS Customer Agreement. All Bitfusion AMIs are priced on an hourly basis and you will only incur charges on top of the base AWS instance charges when the cluster is up and running – simply subscribing to one of our AMIs does not cost you anything.

WARNING: In Step 1 and Step 2, DO NOT launch directly using the “1-Click Launch” option as this will automatically launch an instance. This is not required as all instances will launch via the CFN. For both AMIs below, make sure the “Manual Launch” tab is selected, then simply click on “Accept Software Terms.”

Step 1: Accept Bitfusion Boost Server Software Terms
Boost Server AMI Software Terms

 

Step 2: Accept Bitfusion Boost Caffe Client Software Terms
Boost Caffe Client AMI Software Terms

2. Create an AWS Key Pair

The AWS key pair uses public-key cryptography to provide secure login to your AWS cluster.  You will need create one to access the Bitfusion Client, unless you have created one previously, in which case you can re-use that key and skip directly to Section 3.

Step 1: Select us-east-1 as your region

Our CFN template currently only supports us-east-1

Step 2: Create and name your key pair

In the navigation pane, under “Network & Security”, select “Key Pairs”. Then choose the “Create Key Pair” button.

Step 3: Download and save the key pair

The key pair will automatically download. Make sure you keep this file as it is required to login to the client machine.

3. Create a Bitfusion Boost Cluster

The Bitfusion Boost template is specifically configured for running a Boost Cluster. If you modify any of the AWS template configurations, you may be unable to run the cluster or tools.

Step 1: Accept the template

Accept the template already specified, and click “Next”.

Step 2: Specify the template parameters

On the specify details page, enter a name for your cluster (e.g. BitfusionCluster), accept the default parameters, select your “KeyName” and then click “Next”.

Step 3: Accept default options

On the options page, accept the defaults and click “Next”

4. Launch the Cluster

Finish creating your AWS cluster and login to the Bitfusion Boost client.

Step 1: Create the cluster

On the review page, check the box that allows CloudFormation to create the necessary IAM roles and click “Create”.

Step 2: Monitor Provisioning Process

The cluster stack spins up over a period of 10 to 15 minutes. Watch for the status to change from CREATE_IN_PROGRESS to CREATE_COMPLETE.  You may need to refresh the page to see the status change.

Step 3: Login to the Bitfusion Client

From the Amazon EC2 console page, click on the client. Copy the IP address and login via SSH: ssh -l ubuntu -i

5. Take It for a Spin

Once you have logged in you can query how many GPUs you have and test out Caffe.

How many GPUs do you have?

You can query the number of GPUs available to you with the following command:

bfboost client /usr/local/cuda-7.0/samples/bin/x86_64/linux/release/deviceQuery

Caffe

Run the following commands to test out Caffe and see it running on all 16 GPUs:

cd /opt/caffe-gpu ./data/mnist/get_mnist.sh ./examples/mnist/create_mnist.sh bfboost client "./build/tools/caffe train --solver=examples/mnist/lenet_solver.prototxt -gpu all"

For more information on using Boost please refer to our official documentation

6. Deleting the Cluster

Select your cluster on the Cloud Formation Management page and click Delete Stack. For more information, see Deleting a Stack on the AWS CloudFormation Console.