Deep Learning Blog | Bitfusion

Experience Bitfusion Boost on AWS

Written by maciej | Feb 23, 2016 5:52:42 PM

Recently we released our initial version of Bitfusion Boost which can be installed standalone on existing Linux clusters or clouds. The installation is straight forward and requires only a simple Curl command.  However, based on a lot of customer demand and to lower the bar even more and get you up and running quickly we decided to create several Amazon Machine Images (AMIs). With these AMIs you can create flexible virtual clusters on AWS which can be optimized for efficiency, performance, or a mix of both. You can find detailed instructions about how to get started with these AMIs on AWS in our Boost AWS Documentation. What follows below is a discussion of possible Boost cluster configurations and potential use cases.

There are many flexible configurations which are possible using Bitfusion Boost, however, most of them can be broken down into three distinct categories: many clients to one server (Many-to-One), one client to many servers (One-to-Many), and multiple servers some of which are Clients at the same time (Multi-Server). Most importantly, for all the configurations described below, there are no code changes required for the GPU applications running on the client instances.

Many-to-One Cluster Configuration:

In a Many-to-One configuration there are multiple Boost Client instances which utilize a single Boost Server instance. This cluster configuration ensures maximum utilization of high-performance resources such as GPUs. A single Bitfusion Boost Server, deployed on an AWS GPU instance, can handle concurrent requests from multiple Boost Clients. Each of these clients can run a different GPU application, even though the clients themselves do not have local GPUs, allowing for very modular and dynamic cluster configurations.

In what case can such a configuration be useful? One example is a hybrid workload where some parts are to be executed on a CPU while other parts are best executed on a GPU. In the above configurations, all workloads can be dispatched in parallel to the CPU hosts since the client instances can be scaled up or down dynamically, as needed. Then, whenever a GPU section of the workload is reached by a CPU client, it is automatically offloaded to the GPU server, yielding maximum throughput for all the workloads. This is significantly more efficient than trying to dispatch all these workloads on the GPU server directly - we will cover a detailed analysis of this scenario in a future blog post.

One-to-Many Cluster Configuration:

In a One-to-Many configurations there are multiple Boost Server instances which are utilized by a single Boost Client instance. This cluster configuration ensures maximum performance for the client application as GPU resources are aggregated from multiple servers. Once again, the client instance can execute GPU applications even though it does not have a local GPU.

 

When may such a configuration be beneficial? Once again, a hybrid workload where some parts are to be executed on a CPU while other parts are best executed on a GPU. This configuration is particularly beneficial for when the GPU part can take advantage of more GPUs than a single AWS instance can provide  - currently the largest AWS instance provides 4 x Nvidia K520 GPUs.  Using the Bitfusion Boost Server and Client AMIs this configuration can be setup in minutes without the need for other frameworks such as MPI, Hadoop, or SPARK.

Multi-Server Cluster Configuration:

In a Multi-Server configuration only Boost Server Instances are utilized, however one of the Server Instances also acts as a client. In this configuration, the client which has the application installed will utilize the local GPUs as well as the GPUs in the remote server instances. To the application, this cluster configuration appears as a single virtual GPU node which is significantly more powerful than any available GPU instance offered by AWS.

This configuration is particularly suitable for workloads which are dominated by GPU compute. Examples of these types of workloads include Machine Learning frameworks such as Torch, Caffe, and Tesnorflow, rendering tools such as Blender, 3DS Max, and Maya, and various simulation tools for computational fluid dynamics, molecular dynamics, and quantum chemistry.

Available AMIs:

Currently we have the following AMIs in the AWS Marketplace to get you up and running quickly:

  • Bitfusion Boost Server: The is the server AMI which is used to bring up GPU instances. All servers start automatically once the instance is up and no further configuration is required.
  • Bitfusion Boost Client: This is a clean client AMI pre-installed with all the required Boost collateral. You can install and deploy any OpenCL and CUDA application once this instance is up and then quickly configure it to create one of the topologies described above.
  • Bitfusion Boost Client Caffe: This client AMI is pre-installed with the Caffe Machine Learning software so that you can take advantage of the above cluster configurations to train your neural networks faster.
  • Bitfusion Boost Client Torch: This client AMI is similar to the Caffe AMI, but instead comes pre-installed with the Torch 7 Machine Learning software.

As always if you have any questions or run into any problems be sure to reach out to us and we will be glad to help out. We will be adding additional AMIs in the near future, in the mean you can click here to see all of our available AMIs in the AWS marketplace.