The rCUDA Team is pleased to announce the release of the new rCUDA 4.1 version. In addition to fix some bugs related with asynchronous memory transfers, the new release provides support for:
-CUDA 5.5 Runtime API
-Mellanox Connect-IB network adapters
-cuFFT and cuBLAS libraries
The performance tests carried out by the rCUDA team leveraging the new Connect-IB cards, along with the new NVIDIA Tesla K40 GPU, revealed that rCUDA is able to provide almost the same bandwidth as original CUDA. Our testbed was composed of two Ivy Bridge Xeon based computers equipped with the new dual-port Mellanox Connect-IB cards and a K40 Tesla GPU installed in one of the systems. In this hardware configuration, the maximum data transfer rate achieved by the bandwidthTest benchmark from the NVIDIA CUDA Samples when using local CUDA was 10.06GB/s. In the case of using rCUDA to transfer data from main memory of one computer to the GPU installed in the other computer, the same bandwidthTest benchmark achieved a maximum data transfer rate of 9.91GB/s. This means that rCUDA is able to attain 98.5% of CUDA’s bandwidth, thus introducing a negligible performance loss.
Presentation titled "rCUDA: Share and Aggregate GPUs in Your Cluster" at the Mellanox theatre at SC13 in Denver
We are pleased to provide you the link to the rCUDA presentation made in the Mellanox theatre during the Supercomputing Conference 2013 in Denver, CO, last November. The presentation was titled "rCUDA: Share and Aggregate GPUs in Your Cluster", and provided a quick overview of the main features of rCUDA. You can find the presentation here.