The use of GPUs to accelerate general-purpose scientific and engineering applications is mainstream today, but their adoption in current high-performance computing clusters is impaired primarily by acquisition costs and power consumption. Furthermore, GPU utilization is in general low, causing that the investment on GPU hardware cannot be quickly amortized.
Virtualizing GPUs is an appealing strategy to deal with all these drawbacks simultaneously. By leveraging GPU virtualization, physical GPUs are installed only in some nodes of the cluster, and they are transparently shared among all the nodes. Hence, those nodes equipped with GPUs become servers that provide GPU services to all the nodes in the cluster. GPU virtualization leads to the use of a lower number of GPUs across the cluster, thus reducing acquisition costs and power consumption, while increasing the accelerator utilization rate. Consequently, GPU virtualization enables a more efficient use of the available hardware. Moreover, by making use of GPU virtualization, a single application being executed in one of the nodes of the cluster can be provided with all the GPUs installed in the cluster. This amount of GPUs is usually much larger than the number of GPUs that can fit into a single box. Therefore, by using GPU virtualization applications are further accelerated.
The rCUDA framework is the most modern remote GPU virtualization solution today. It is a development of the Parallel Architectures Group from Universitat Politecnica de Valencia (Spain). rCUDA enables the concurrent remote usage of CUDA-enabled devices in a transparent way. Thus, the source code of applications does not need to be modified in order to use remote GPUs but rCUDA takes care of all the necessary details. Furthermore, the overhead introduced by using a remote GPU is very small. In this regard, execution time is usually increased by less than 4% when a high performance network fabric is used. Basically, rCUDA can be useful in three different environments:
- Clusters. rCUDA allows a single non-MPI application to make use of all the GPUs in the cluster, independently of the exact node where they are installed. Additionally, rCUDA allows to adjust the exact amount of GPUs in the cluster to the actual computing needs, leading to increased GPU utilization and reduced overall costs (energy, acquisition, maintenance, space, cooling, etc).
- Academia. In commodity networks, rCUDA provides concurrent access to a few high performance GPUs to many students, thus reducing teaching costs.
- Virtual Machines. rCUDA allows applications running inside virtual machines to access GPUs installed in remote physical machines.
rCUDA provides full compatibility support with CUDA. It implements all of the functions in the CUDA Runtime API and Driver API, excluding only those related with graphics interoperability. It additionally includes highly optimized TCP and low-level InfiniBand pipelined communications as well as full multi-thread and multi-node capabilities. rCUDA targets the same Linux OS distributions as CUDA does, providing also support for x86 and ARM processor architectures. Furthermore, an integration of rCUDA with the SLURM scheduler has been developed, allowing your scheduled jobs to use remote GPUs. The combination of SLURM + rCUDA provides reductions in overall execution times of job batches between 25% and 45%, depending on the exact composition of the job batch. Consumed energy is also noticeably reduced.
Currently, rCUDA has been successfully tested with several applications selected from the list of “Popular GPU-accelerated Applications” by NVIDIA. In this way, in addition to show the right behavior with the NVIDIA SDK samples, rCUDA has been applied to the following applications: LAMMPS, WideLM, CUDASW++, HOOMDBlue, mCUDA-MEME, GPU-Blast, Gromacs, GAMESS, DL-POLY, and HPL. In the papers and presentations available in the documentation page you may find additional information about the performance of many of these applications when used with rCUDA.
If you are interested in using rCUDA, please proceed to software request form page. The rCUDA team will be glad to send you a copy of the software at no cost. It is distributed for free. Notice that the rCUDA technology is own by Technical University of Valencia and the software request form page is the only way to get a copy of this technology.
For further information, please refer to the papers and presentations listed in the documentation page.
The rCUDA Team
The rCUDA Team is affiliated to Universitat Politècnica de València (Technical University of Valencia) in Spain. The team is led by Federico Silla since 2008, when the development of the rCUDA technology began. Different members of the rCUDA Team carry out different tasks. Currently, the team is composed of the following members:
Leadership & Coordination:
Federico Silla. Associate Professor.
Development & Testing:
Antonio Diaz. Testing.
Pablo Higueras. Developer.
Javier Prades. Developer.
Carlos Reaño. Senior Developer.
Jaime Sierra. Developer.
José Duato. Full Professor.
Antonio Peña (left the rCUDA Team in 2011)