Application performance demands have increasingly been outpacing Moore’s law in a variety of fields, particularly AI and deep learning. Co-processors like GPUs offer immense speedup to applications in fields like AI And deep learning, compared to CPUs. At Bitfusion, we build technology to disaggregate co-processors like GPUs and re-aggregate them in real-time over Ethernet, Infiniband RDMA or RoCE network, to create an elastic AI infrastructure. Just like network attached storage, our technology allows customers to do network attached co-processors.
We love to listen and do our best to meet our customers’ needs. Today, Bitfusion is announcing the availability of FlexDirect for customers to leverage our technology more directly to attach and detach GPUs to workloads in real-time as well as slice GPU to virtual GPUs in any size, offering unprecedented utilization of GPUs. Out of the box, FlexDirect supports GPU applications written in CUDA. It also has extensions to support any OpenCL complaint hardware (FPGAs and ASICs). It runs in user space and works with any public cloud, private cloud, on premise hardware as well as any hypervisor or container environments.
GPU utilization in an organization or in the public cloud usually follows a choppy trend like the below graph:
FlexDirect allows you to take advantage of underutilized GPU compute cycles more efficiently by allowing real-time aggregation and disaggregation of GPUs – therefore creating pools of GPUs. For instance, you can keep your workloads on CPU machines most of the time and remote attach a GPU only when the workload needs a GPU, increasing utilization of GPUs by 2-4x.
FlexDirect (in addition to on-prem CPU servers and private cloud) operate in any public cloud. Below is an example of how FlexDirect would work with AWS EC2 Instances. Instead of using GPU instances for a single user or workload (whether the GPU is running a live application or not) like p2.8xlarge, you will use a combination of a CPU instance like r4.8xlarge and timeshare the GPU instance like p2.8xlarge only when running live CUDA workloads. The Bitfusion layer exposes remote GPUs as if they are directly attached to the remote CPU system. The application developer doesn't need to do anything special in this case. The developer will simply develop and deploy as if they are working on a larger GPU instance. In fact, the GPU workload is not aware that the GPU is remotely attached; it just views the remote attached p2.8xlarge GPUs as if its locally attached to the r4 instance.
Not only does FlexDirect allow youto attach GPUs to any machine remotely, offering reduction in total cost of ownership, it also lets you slice a single GPU into multiple virtual GPUs of any size, providing increased performance along with increased utilization because of the ability to pack more workloads to run in parallel on the same GPU.
FlexDirect operates via an intuitive command line interface (CLI) and can be integrated easily into any existing environments.
Customers can go to https://bitfusion.io/product/flexdirect/ to get started with FlexDirect right-away.