by subbu on February 1, 2018

Announcing FlexDirect which allows remote, dynamic attach and fractional slicing of GPUs on any infrastructure

Application performance demands have increasingly been outpacing Moore’s law in a variety of fields, particularly AI and deep learning. Co-processors like GPUs offer immense speedup to applications in fields like AI And deep learning, compared to CPUs. At Bitfusion, we build technology to disaggregate co-processors like GPUs and re-aggregate them in real-time over Ethernet, Infiniband RDMA or RoCE network, to create an elastic AI infrastructure. Just like network attached storage, our technology allows customers to do network attached co-processors.

We love to listen and do our best to meet our customers’ needs. Today, Bitfusion is announcing the availability of FlexDirect for customers to leverage our technology more directly to attach and detach GPUs to workloads in real-time as well as slice GPU to virtual GPUs in any size, offering unprecedented utilization of GPUs. Out of the box, FlexDirect supports GPU applications written in CUDA. It also has extensions to support any OpenCL complaint hardware (FPGAs and ASICs). It runs in user space and works with any public cloud, private cloud, on premise hardware as well as any hypervisor or container environments. 

GPU utilization in an organization or in the public cloud usually follows a choppy trend like the below.

GPU Virtualization.png

FlexDirect allows you to take advantage of underutilized GPU compute cycles more efficiently by allows real-time aggregation and disaggregation of GPUs – therefore creating pools of GPUs. For instance, you can keep your workloads on CPU machines most of the time and remote attach a GPU only when the workload needs a GPU, increasing utilization of GPUs by 2-4x. 


GPU 2.png

FlexDirect (in addition to on-prem CPU serevrs and private cloud) operate in any public cloud. Here is an example of how FlexDirect would work with AWS EC2 Instances. Instead of using GPU instances for a single user or workload (whether the GPU is running live application or not) like p2.8xlarge, you will use a combination of a CPU instance like r4.8xlarge and timeshare the GPU instance like p2.8xlarge only when running live CUDA workloads. The Bitfusion layer exposes remote GPUs as if they are directly attached to the remote CPU system. The application developer doesn't need to do anything special in this case. The developer will simply develop and deploy as if they are working on a larger GPU instance. In fact, the GPU workload is not aware that the GPU is remotely attached; it just views the remote attached p2.8xlarge GPUs as if its local attached to the r4 instance.

Screen Shot 2018-01-30 at 5.37.52 PM.png

Not only does FlexDirect allows you attach GPUs to any machine remotely offering reduction in total cost of ownership, it also lets you to slice a single GPU into multiple virtual GPUs of any size, providing increased performance along with increased utilization because of the ability to pack more workloads to run in parallel on the same GPU.

GPU Server.png  Graph.png

FlexDirect operates via an intuitive command line interface (CLI) and can be integrated easily into any existing environments.

Customers can go to to get started with FlexDirect right-away.


New Call-to-action

Stay Up-to-Date!

Get our regular deep learning and AI news, insights, tutorials, and more.