ARS: In Part 1 of our series, we detailed the hardware choices and benchmarked the various GPUs and CPUs used in the HPU4Science scientific computation cluster. The cluster is built in a master-worker configuration in which the master dispatches jobs to the workers, compiles and processes the results, and handles data storage. The master is equipped with a dual Intel XEON processor, a four-SSD RAID array for short-term storage, and an array of five 2TB hard drives for archival storage. The networking is a simple Gigabit Ethernet.
Currently, there are three workers in the cluster running Intel i7 or Core 2 Quad processors and using GPUs for highly parallelized computation. In the last paper, the third and newest worker had four GTX 580s that give four TFlops of measured, peak computational performance (this equates to six TFLOPs of theoretical performance, which is the measure used for the Top500 supercomputers list). The hardware for a fourth worker with the same configuration as the third has just arrived, so the cluster will soon comprise a total of four workers with eight GTX 580s, three GTX 480s, three GTX 285s, a C1060 Tesla GPU, and a GTX 295 dual GPU. The estimated computational power of the whole system is 20 TFLOPS in theory, and 12.5 TFLOPS in practice. Some brand new GTX 590s are currently being ordered for a fifth worker, so the total computational power is still increasing.
Obviously, a cluster of this scale requires careful software selection to maximize the performance of the hardware. In this article, we detail the software choices for the HPU4Science cluster and discuss the areas where software and performance collide.