At the conference “The many faces of QCD” (see here, here and here) I have had the opportunity to talk with people doing lattice computations at large computer facilities. They said to me that this kind of activities imply the use of large computers, user queues (as these resources are generally shared) and months of computations before to see the results. Today the situation is changing for the better due to an important technological shift. Indeed, it is well-known that graphics cards are built with graphical processing units (GPU) made by several computational cores that work in parallel. Such cores do very simple computational tasks but, due to the parallel architecture, very complex operations can be reduced to a set of such small tasks that the parallel architecture executes in an exceptionally short time. This is the reason why, on a PC equipped with such an architecture, very complex video outputs can be obtained with exceptionally good performances.
People at Nvidia have had the idea to use these cores to do just floating point operations and use them for scientific computations. This is the way CUDA (Compute Unified Device Architecture) was born. So, the first Tesla cards without graphics output, but with GPUs, were produced and the development toolkit was made freely available. Nvidia made parallel computation available to the masses. Just mounting a graphics card with CUDA architecture it is possible for everybody to have a desktop computer with Teraflops performances!
As soon as I become aware of the existence of CUDA I decide to mount on this bandwagon opening to me the opportunity to do QCD on the lattice at my home. So, I upgraded my PC at home with a couple of 9800 GX2 cards (2 GPUs for each with 512 MB of DDR3 RAM each one) having CUDA architecture 1.1. This means that these cards can do single precision computations at about 1 Tflops and my PC can express a performance of 2 Tflops. But I have no double precision. I have also changed my motherboard to a Nvidia 790i Ultra that support a 3-way SLI mode and the power supply upgraded to 1 KW (Silent Gold Cooler Master). I have added 4 GB of DDR3 RAM and maintained my CPU, an Intel Core 2 Duo E8500 with 3.16 GHz for each core. The interesting point about this configuration is that I have bought the three Nvidia cards from Ebay as used material at a very low cost. Then, I was in business with very few bucks!
Before this upgrading of my machine I had Windows XP home 32 bit installed. This operating system was only able to address 3 GB of RAM and 1 GB of it was used by the two graphics cards. This revealed a serious drawback to all the matter. In a moment I will explain what I did to overcome it.
The next important step was to obtain CUDA code for QCD. The question is that CUDA technology is going to spread rapidly into academic environment and a lot of code was available. Initially I thougth to MILC code. There is CUDA code available and people of MILC Collaboraion was very helpful. This code is built for Linux and I was not able to make this operating system up and running on my platform. Besides, I would have had needed a lot of time to make all this code working for me and I had to give up despite myself. Meantime, a couple of papers by Pedro Bicudo and Nuno Cardoso appeared (see here and here). Pedro was a nice companion at the conference “The many faces of QCD” where I have had the opportunity to know him. He was not aware I had asked the source code to his student Nuno. Nuno has been very kind to give me the link and I downloaded the code. This has been a sound starting point for the work on my platform. The code has been written for CUDA since the start and so well optimized. Pedro said to me that the optimization phase cost them a lot of work while putting down the initial code was relatively easy. They worked on a Linux platform so he was surprised when I said to him that I intended to port their code under Microsoft Windows. But this is my home PC and all my family uses it and also my attempt to install Ubuntu 64 bit revealed a failure that cost to me the use of Windows installation disk to remove the dual boot.
Then, during my Christmas holidays when I have had a lot of time, I started to port Pedro and Nuno code under Windows XP Home. It was very easy. Their code, entirely written with C++, needed just the insertion of a define. So, setting the path in a DOS mode box and using nvcc with Visual Studio 2008 (the only compiler Nvidia supports under Windows so far) I was able to get a running code but with a glitch. This code was only able to run on my CPU. The reason was that I had not enough memory under Windows XP 32 bit to complete the compilation for the code of the graphics cards. Indeed, Nvidia compiler ptxas stopped with an error and I was not able to get it running on the graphics cards of my computer. But after this step, successful for some aspects, I wrote to Pedro and Nuno informing them of my success on porting the code at least running on my CPU under Windows. The code was written so well that very few was needed to port it! Pedro said to me that something had to be changed in my machine: Mostly the graphics cards should have been taken more powerful. I am aware of this shortcoming but my budget was not so good at that time. This is surely my next upgrade (a couple of 580 GTX with Fermi architecture supporting double precision).
As I have experienced memory problems, the next step was to go to a 64 bit operating system to use all my 4 GB RAM. Indeed, on another disk of my computer, I installed Windows 7 Ultimate 64 bit. Also in this case the porting of Pedro and Nuno’s code was very smooth. In a DOS box I have obtained their code up running again but this time for my graphics cards and not just for CPU only. As I have the time I will do some computations of observables of SU(2) QCD experiencing with the limit of my machine. But this result is from yesterday and I need more time to do some physics.
Pedro informed me that they are working for SU(3) and this is more difficult. Meantime, I have to thank him and his student Nuno very much for the very good job they did and for permitting me to have lattice QCD on my computer at home successfully working. I hope this will represent a good starting point for other people doing this kind of research.
Update: Pedro authorized me to put here the link to download the code. Here it is. Thank you again Pedro!
Nuno Cardoso, & Pedro Bicudo (2010). Lattice SU(2) on GPU’s arxiv arXiv: 1010.1486v1
Nuno Cardoso, & Pedro Bicudo (2010). SU(2) Lattice Gauge Theory Simulations on Fermi GPUs J.Comput.Phys.230:3998-4010,2011 arXiv: 1010.4834v2