## Back to CUDA

11/02/2013

It is about two years ago when I wrote my last post about CUDA technology by NVIDIA (see here). At that time I added two new graphic cards to my PC, being on the verge to reach 3 Tflops in single precision for lattice computations.  Indeed, I have had an unlucky turn of events and these cards went back to the seller as they were not working properly and I was completely refunded. Meantime, also the motherboard failed and the hardware was largely changed  and so, I have been for a lot of time without the opportunity to work with CUDA and performing intensive computations as I planned. As it is well-known, one can find a lot of software exploiting this excellent technology provided by NVIDIA and, during these years, it has been spreading largely, both in academia and industry, making life of researchers a lot easier. Personally, I am using it also at my workplace and it is really exciting to have such a computational capability at your hand at a really affordable price.

Now, I am newly able to equip my personal computer at home with a powerful Tesla card. Some of these cards are currently dismissed as they are at the end of activity, due to upgrades of more modern ones, and so can be found at a really small price in bid sites like ebay. So, I bought a Tesla M1060 for about 200 euros. As the name says, this card has not been conceived for a personal computer but rather for servers produced by some OEMs. This can also be realized when we look at the card and see a passive cooler. This means that the card should have a proper physical dimension to enter into a server while the active dissipation through fans should be eventually provided by the server itself. Indeed, I added an 80mm Enermax fan to my chassis (also Enermax Enlobal)  to be granted that the motherboard temperature does not reach too high values. My motherboard is an ASUS P8P67 Deluxe. This is  a very good card, as usual for ASUS, providing three PCIe 2.0 slots and, in principle, one can add up to three video cards together. But if you have a couple of NVIDIA cards in SLI configuration, the slots work at x8. A single video card will work at x16.  Of course, if you plan to work with these configurations, you will need a proper PSU. I have a Cooler Master Silent Pro Gold 1000 W and I am well beyond my needs. This is what remains from my preceding configuration and is performing really well. I have also changed my CPU being this now an Intel i3-2125 with two cores at 3.30 GHz and 3Mb Cache. Finally, I added  16 Gb of Corsair Vengeance DDR3 RAM.

The installation of the card went really smooth and I have got it up and running in a few minutes on Windows 8 Pro 64 Bit,  after the installation of the proper drivers. I checked with Matlab 2011b and PGI compilers with CUDA Toolkit 5.0 properly installed. All worked fine. I would like to spend a few words about PGI compilers that are realized by The Portland Group. I have got a trial license at home and tested them while at my workplace we have a fully working license. These compilers make the realization of accelerated CUDA code absolutely easy. All you need is to insert into your C or Fortran code some preprocessing directives. I have executed some performance tests and the gain is really impressive without ever writing a single line of CUDA code. These compilers can be easily introduced into Matlab to yield mex-files or S-functions even if they are not yet supported by Mathworks (they should!) and also this I have verified without too much difficulty both for C and Fortran.

Finally, I would like to give you an idea on the way I will use CUDA technology for my aims. What I am doing right now is porting some good code for the scalar field and I would like to use it in the limit of large self-interaction to derive the spectrum of the theory. It is well-known that if you take the limit of the self-interaction going to infinity you recover the Ising model. But I would like to see what happens with intermediate but large values as I was not able to get any hint from literature on this, notwithstanding this is the workhorse for any people doing lattice computations. What seems to matter today is to show triviality at four dimensions, a well-acquired evidence. As soon as the accelerate code will run properly, I plan to share it here as it is very easy to get good code to do lattice QCD but it is very difficult to get good code for scalar field theory as well. Stay tuned!

## Confinement revisited

27/09/2012

Today it is appeared a definitive updated version of my paper on confinement (see here). I wrote this paper last year after a question put out to me by Owe Philipsen at Bari. The point is, given a decoupling solution for the gluon propagator in the Landau gauge, how does confinement come out? I would like to remember that a decoupling solution at small momenta for the gluon propagator is given by a function reaching a finite non-zero value at zero. All the fits carried out so far using lattice data show that a sum of few Yukawa-like propagators gives an accurate representation of these data. To see an example see this paper. Sometime, this kind of propagator formula is dubbed Stingl-Gribov formula and has the property to have a fourth order polynomial in momenta at denominator and a second order one at the numerator. This was firstly postulated by Manfred Stingl on 1995 (see here). It is important to note that, given the presence of a fourth power of momenta, confinement is granted as a linear rising potential can be obtained in agreement with lattice evidence. This is also in agreement with the area law firstly put forward by Kenneth Wilson.

At that time I was convinced that a decoupling solution was enough and so I pursued my analysis arriving at the (wrong) conclusion, in a first version of the paper, that screening could be enough. So, strong force should have to saturate and that, maybe, moving to higher distances such a saturation would have been seen also on the lattice. This is not true as I know today and I learned this from a beautiful paper by Vicente Vento, Pedro González and Vincent Mathieu. They thought to solve Dyson-Schwinger equations in the deep infrared to obtain the interquark potential. The decoupling solution appears at a one-gluon exchange level and, with this approximation, they prove that the potential they get is just a screening one, in close agreement with mine and any other decoupling solution given in a close analytical form. So, the decoupling solution does not seem to agree with lattice evidence that shows a linearly rising potential, perfectly confining and in agreement with what Wilson pointed out in his classical work on 1974. My initial analysis about this problem was incorrect and Owe Philipsen was right to point out this difficulty in my approach.

This question never abandoned my mind and, with the opportunity to go to Montpellier this year to give a talk (see here), I presented for the first time a solution to this problem. The point is that one needs a fourth order term in the denominator of the propagator. This can happen if we would be able to get higher order corrections to the simplest one-gluon exchange approximation (see here). In my approach I can get loop corrections to the gluon propagator. The next-to-leading one is a two-loop term that gives rise to the right term in the denominator of the propagator. Besides, I am able to get the renormalization constant to the field and so, I also get a running mass and coupling. I gave an idea of the way this computation should be performed at Montpellier but in these days I completed it.

The result has been a shocking one. Not only one gets the linear rising potential but the string tension is proportional to the one obtained in d= 2+1 by V. Parameswaran Nair, Dimitra Karabali and Alexandr Yelnikov (see here)! This means that, apart from numerical factors and accounting for physical dimensions, the equation for the string tension in 3 and 4 dimensions is the same. But we would like to note that the result given by Nair, Karabali and Yelnikov is in close agreement with lattice data. In 3 dimensions the string tension is a pure number and can be computed explicitly on the lattice. So, we are supporting each other with our conclusions.

These results are really important as they give a strong support to the ideas emerging in these years about the behavior of the propagators of a Yang-Mills theory at low energies. We are even more near to a clear understanding of confinement and the way mass emerges at macroscopic level. It is important to point out that the string tension in a Yang-Mills theory is one of the parameters that any serious theoretical approach, pretending to go beyond a simple phenomenological one,  should be able to catch. We can say that the challenge is open.

Marco Frasca (2011). Beyond one-gluon exchange in the infrared limit of Yang-Mills theory arXiv arXiv: 1110.2297v4

Kenneth G. Wilson (1974). Confinement of quarks Phys. Rev. D 10, 2445–2459 (1974) DOI: 10.1103/PhysRevD.10.2445

Attilio Cucchieri, David Dudal, Tereza Mendes, & Nele Vandersickel (2011). Modeling the Gluon Propagator in Landau Gauge: Lattice Estimates of Pole Masses and Dimension-Two Condensates arXiv arXiv: 1111.2327v1

M. Stingl (1995). A Systematic Extended Iterative Solution for QCD Z.Phys. A353 (1996) 423-445 arXiv: hep-th/9502157v3

P. Gonzalez, V. Mathieu, & V. Vento (2011). Heavy meson interquark potential Physical Review D, 84, 114008 arXiv: 1108.2347v2

Marco Frasca (2012). Low energy limit of QCD and the emerging of confinement arXiv arXiv: 1208.3756v2

Dimitra Karabali, V. P. Nair, & Alexandr Yelnikov (2009). The Hamiltonian Approach to Yang-Mills (2+1): An Expansion Scheme and Corrections to String Tension Nucl.Phys.B824:387-414,2010 arXiv: 0906.0783v1

## An interesting review

14/09/2011

It is some time I am not writing posts but the good reason is that I was in Leipzig to IRS 2011 Conference, a very interesting event in a beautiful city.  It was inspiring to be in the city where Bach spent a great part of his life. Back to home, I checked as usual my dailies from arxiv and there was an important review by Boucaud, Leroy, Yaouanc, Micheli, Péne and Rodríguez-Quintero. This is the French group that produced striking results in the analysis of Green functions for Yang-Mills theory.

In this paper they do a great work by reviewing the current situation and clarifying  the main aspects of the analysis carried out using Dyson-Schwinger equations. These are a tower of equations for the n-point functions of a quantum field theory that can be generally solved by some truncation (with an exception, see here) that cannot be completely controlled. The reason is that the equation of lower order depends on n-point functions of higher orders and so, at some point, we have to decide the behavior of some of these higher order functions truncating the hierarchy. But this choice is generally not under control.

About these techniques there is a main date, Reigensburg 2007, when some kind of wall just went down. Since then, the common wisdom was a scenario with a gluon propagator going to zero when momenta go to zero while, in the same limit, the ghost propagator should go to infinity faster than the free case: So, the gluon propagator was suppressed and the ghost propagator enhanced at infrared. On the lattice, such a behavior was never explicitly observed but was commented that the main reason was the small volumes considered in these computations. On 2007, volumes reached a huge extension in lattice computations, till (27fm)^4, and so the inescapable conclusion was  that lattice produced another solution: A gluon propagator reaching a finite non-zero value and the ghost propagator behaving exactly as that of a free particle. This was also the prevision of the French group together with other researchers as Cornwall, Papavassiliou, Aguilar, Binosi and Natale. So, this new solution entered into the mainstream of the analysis of Yang-Mills theory in the infrared and was dubbed “decoupling solution” to distinguish it from the former one, called instead “scaling solution”.

In this review, the authors point out an important conclusion: The reason why authors missed the decoupling solution and just identified the scaling one was that their truncation forced the Schwinger-Dyson equation to a finite non-zero value of the strong coupling constant. This is a crucial point as this means that authors that found the scaling solution were admitting a non-trivial fixed point in the infrared for Yang-Mills equations. This was also the recurring idea in that days but, of course, while this is surely true for QCD, a world without quarks does not exist and, a priori, nothing can be said about Yang-Mills theory, a theory with only gluons and no quarks. Quarks change dramatically the situation as can also be seen for the asymptotic freedom. We are safe because there are only six flavors. But about Yang-Mills theory nothing can be said in the infrared as such a theory is not seen in the reality if not interacting with fermionic fields.

Indeed, as pointed out in the review, the running coupling was seen to behave as in the following figure (this was obtained by the German group, see here)

Running coupling of a pure Yang-Mills theory as computed on the lattice

This result is quite shocking and completely counterintuitive. It is pointing out, even if not yet confirming, that a pure Yang-Mills theory could have an infrared trivial fixed point! This is something that defies common wisdom and can explain why former researchers using the Dyson-Schwinger approach could have missed the decoupling solution. Indeed, this solution seems properly consistent with a trivial fixed point and this can also be inferred by the goodness of the fit of the gluon propagator with a Yukawa-like propagator if we content ourselves with the best agreement just in the deep infrared and the deep ultraviolet where asymptotic freedom sets in. In fact, with a trivial fixed point the theory is free in this limit but you cannot pretend agreement on all the range of energies with a free propagator.

Currently, the question of the right infrared behavior of the two-point functions for Yang-Mills theory is hotly debated yet and the matter that is at stake here is the correct understanding and management of low-energy QCD. This is one of the most fundamental physics problem and something I would like to know the answer.

Ph. Boucaud, J. P. Leroy, A. Le Yaouanc, J. Micheli, O. Péne, & J. Rodríguez-Quintero (2011). The Infrared Behaviour of the Pure Yang-Mills Green Functions arXiv arXiv: 1109.1936v1

Marco Frasca (2009). Exact solution of Dyson-Schwinger equations for a scalar field theory arXiv arXiv: 0909.2428v2

I. L. Bogolubsky, E. -M. Ilgenfritz, M. Müller-Preussker, & A. Sternbeck (2009). Lattice gluodynamics computation of Landau-gauge Green’s functions in
the deep infrared Phys.Lett.B676:69-73,2009 arXiv: 0901.0736v3

QQBD6R52YD2K

## It was twenty years ago today . . .

16/08/2011

With these beautiful words starts a recollection paper by the founder of arXiv, Paul Ginsparg. This is worth the reading as this history spans a number of years exactly overlapping the computer revolution that definitely changed our lives. What Paul also changed through these new information tools was the way researchers should approach scientific communication. It is a revolution that is not stopped yet and all the journals I submit my papers have a link to arXiv for direct uploading of the preprint. This change has had also a great impact on the way these same journals should present to authors, readers and referees as well at their website.

For my readers I would like just to point out how relevant was all this for our community with the Grisha Perelman’s case. I think all of you are well aware that Perelman never published his papers on a journal: You can find both of them on arXiv. Those preprints paid as much as a Fields medal and a Millenium prize. Not bad I should say for a couple of unpublished papers. Indeed, it is common matter to have a paper largely discussed well before its publication and often a preprint becomes a case in the community without not even seeing the light of a publication. It is quite common for us doing research to console colleagues complaining about the harsh peer-review procedure by saying that today exists arXiv and that is enough to make your work widely known.

I was a submitter since 1994, almost at the very start, and I wish that the line of successes of this idea will never end.

Finally, to prove how useful is arXiv for our community, I would like to point out to you, for your summer readings a couple of papers. The first one is this from R. Aouane, V. Bornyakov, E.-M. Ilgenfritz, V. Mitrjushkin, M. Müller-Preussker, A. Sternbeck. My readers should know that these researchers always do a fine work and get important results on their lattice computations. The same happens here where they study the gluon and ghost propagators at finite temperature in the Landau gauge. Their conclusion about Gribov copies is really striking, comforting my general view on this matter (see here), that Gribov copies are not essential not even when one rises the temperature. Besides, they discuss the question of a proper order parameter to identify the phase transition that we know exists in this case.

The next paper is authored by Tereza Mendes, Axel Maas and Stefan Olejnik (see here). The idea in this work is to consider a gauge, the $\lambda$-gauge, with a free parameter interpolating between different gauges to see the smoothness of the transition and the way of change of the propagators. They reach a volume of 70^4 but Tereza told me that the errors are too large yet for a neat comparison with smaller volumes. In any case, this is a route to be pursued and I am curious about the way the interpolated propagator behaves at the deep infrared with larger lattices.

Discussions on Higgs identification are well alive yet ( you can see here). take a look and enjoy!

Paul Ginsparg (2011). It was twenty years ago today … arXiv arXiv: 1108.2700v1

R. Aouane, V. Bornyakov, E. -M. Ilgenfritz, V. Mitrjushkin, M. Müller-Preussker, & A. Sternbeck (2011). Landau gauge gluon and ghost propagators at finite temperature from
quenched lattice QCD arXiv arXiv: 1108.1735v1

Axel Maas, Tereza Mendes, & Stefan Olejnik (2011). Yang-Mills Theory in lambda-Gauges arXiv arXiv: 1108.2621v1

## Evidence of a QCD critical endpoint at RHIC

21/06/2011

A critical endpoint in QCD is a kind of holy grail in nuclear physics. It has been theorized as a point where deconfinement occurs and hadronic matter leaves place to some kind of plasma of quarks and gluons. We know that the breaking of chiral symmetry is something that people has proposed several years ago and we recently gave a proof of existence of such a transition (see here). But here the situation is more complex: We have essentially two physical variables to describe the phase diagram and these are temperature and chemical potential. This makes lattice computations a kind of nightmare. The reason is the sign problem. Some years ago Zoltan Fodor and Sandor Katz come out with a pioneering paper (see here) doing lattice computation and seeing the chemical potential taking an imaginary factor: The infamous sign problem. Discretization implies it but a theoretical physicist can happily lives just ignoring it. Fodor and Katz evaded the problem just taking an absolute value but this approach was criticized casting doubt on their results at chemical potential different from zero. It should be said that they gave evidence of existence for the critical point and surely their results are unquestionably correct with zero chemical potential in close agreement with my and others findings. A lucid statement of the problems of lattice computations for finite temperatures and densities was recently given by Philippe de Forcrand (see here).

So far, people has produced several results just working around with phenomenological model like a Nambu-Jona-Lasinio or sigma model. This way of work arises from our current impossibility to manage QCD at very low energies but, on the other side, we are well aware that these models seem to represent reality quite well. The reason is that a Nambu-Jona-Lasinio is really the low-energy limit for QCD but I will not discuss this matter here having done this before (see here). Besides, the sigma model arises naturally in the low-energy limit interacting with quarks. The sigma field is a true physical field that drives the phase transitions in low-energy QCD.

While the hunt for the critical point in the lattice realm is already open since the paper by Fodor and Katz, the experimental side is somewhat more difficult to exploit. The only facility we have at our disposal is RHIC and no much proposals are known to identify the critical point from the experimental data were available since a fine proposal by Misha Stephanov a few years ago (see here and here). The idea runs as follows.  At the critical point, fluctuations are no more expected to be Gaussian and all the correlations are extended to all the hadronic matter  as the correlation length is diverging. Non-Gaussianity implies that if we compute cumulants, linked to higher order moments of the probability distribution, these will depend on the correlation length with some power and, particularly, moments like skewness and kurtosis, that are a measure of deviation from Gaussianity, start to change. Particularly, kurtosis is expected to change sign. So, if we are able to measure such a deviation in a laboratory facility we are done and we get evidence for a critical point and critical behavior of hadronic matter. We just note that Stephanov accomplishes his computations using a sigma model and this is a really brilliant hindsight.

At RHIC a first evidence of this has been obtained by STAR Collaboration (see here). These are preliminary results but further data are expected this year. The main result is given in the following figureWe see comparison with data from lattice as red balls for Au+Au collisions and the kurtosis goes down to negative values! The agreement with lattice data is striking and this is already evidence for a critical endpoint. But this is not enough as can be seen from the large error bar. Indeed further data are needed to draw a definitive conclusion and, as said, these are expected for this year. Anyhow, this is already a shocking result. Surely, we stay tuned for this mounting evidence of a critical endpoint. This will represent a major discovery for nuclear physics and, in some way, it will make easier lattice computations with a proper understanding of the way the sign problem should be settled.

Marco Frasca (2011). Chiral symmetry in the low-energy limit of QCD at finite temperature arXiv arXiv: 1105.5274v2

Z. Fodor, & S. D. Katz (2001). Lattice determination of the critical point of QCD at finite T and  \mu JHEP 0203 (2002) 014 arXiv: hep-lat/0106002v2

Philippe de Forcrand (2010). Simulating QCD at finite density PoS (LAT2009)010, 2009 arXiv: 1005.0539v2

M. A. Stephanov (2008). Non-Gaussian fluctuations near the QCD critical point Phys.Rev.Lett.102:032301,2009 arXiv: 0809.3450v1

Christiana Athanasiou, Krishna Rajagopal, & Misha Stephanov (2010). Using Higher Moments of Fluctuations and their Ratios in the Search for
the QCD Critical Point Physical review D arXiv: 1006.4636v2

Xiaofeng Luo (2011). Probing the QCD Critical Point with Higher Moments of Net-proton
Multiplicity Distributions arXiv arXiv: 1106.2926v1

## Chiral condensates in a magnetic field: Accepted!

20/04/2011

As my readers could know, I have had a paper written in collaboration with Marco Ruggieri (see here). Marco is currently working at Yukawa Institute in Kyoto (Japan). The great news is that our paper has been accepted for publication on Physical Review D. I am really happy for this very good result of a collaboration that I hope will endure. Currently, we are working on a proof of existence of the critical endpoint for QCD using the Nambu-Jona-Lasinio model. This is an open question that has serious difficulties to get an answer also for a fundamental problem encountered on the lattice: The so-called sign problem. So, a mathematical proof will make a breakthrough in the field with the possibility to be experimentally confirmed at laboratory facilities.

Marco himself proposed a novel approach to get a proof of the critical endpoint to bypass the sign problem (see here).

So, I hope to have had the chance to transmit to my readership the excitement of this line of research and how it is strongly entangled with the understanding of low-energy QCD and the more deep question of the mass gap in the Yang-Mills theory. Surely, I will keep posting on this. Just stay tuned!

Marco Frasca, & Marco Ruggieri (2011). Magnetic Susceptibility of the Quark Condensate and Polarization from
Chiral Models arXiv arXiv: 1103.1194v1

Philippe de Forcrand (2010). Simulating QCD at finite density PoS (LAT2009)010, 2009 arXiv: 1005.0539v2

## QCD at finite temperature: Does a critical endpoint exist?

05/04/2011

Marco Ruggieri is currently a post-doc fellow at Yukawa Institute for theoretical physics in Kyoto (Japan). Marco has got his PhD at University of Bari in Italy and spent a six months period at CERN. Currently, his main research areas are QCD at finite temperature and high density, QCD behavior in strong magnetic fields and effective models for QCD but you can find a complete CV at his site. So, in view of his expertize I asked him a guest post in  my blog to give an idea of the current situation of these studies. Here it is.

It is well known that Quantum Chromodynamics (QCD) is the most accredited theory describing strong interactions. One of the most important problems of modern QCD is to understand how color confinement and chiral symmetry breaking are affected by a finite temperature and/or a finite baryon density. For what concerns the former, Lattice simulations convince ourselves that both deconfinement and (approximate) chiral symmetry restoration take place in a narrow range of temperatures, see the recent work for a review. On the other hand, it is problematic to perform Lattice simulations at finite quark chemical potential in true QCD, namely with number of color equal to three, because of the so-called sign problem, see here for a recent review on this topic. It is thus very difficult to access the high density region of QCD starting from first principles calculations.

Despite this difficulty, several work has been made to avoid the sign problem, and make quantitative predictions about the shape of the phase diagram of three-color-QCD in the temperature-chemical potential plane, see here again for a review. One of the most important theoretical issues in along this line is the search for the so-called critical endpoint of the QCD phase diagram, namely the point where a crossover and a first order transition line meet. Its existence was suggested by Asakawa and Yazaki (AY) several years ago (see here) using an effective chiral model; in the 2002, Fodor and Katz (FK) performed the first Lattice simulation (see here) in which it was shown that the idea of AY could be realized in QCD with three colors. However, the estimate by FK is affected seriously by the sign problem. Hence, nowadays it is still under debate if the critical endpoint there exists in QCD or not.

After referring to this for a comprehensive review of some of the techniques adopted by the Lattice community to avoid the sign problem and detect the critical endpoint, it is worth to cite an article by Marco Ruggieri, which appeared few days ago on arXiv, in which an exotic possibility to detect the critical endpoint by virtue of Lattice simulations avoiding the sign problem has been detected, see here . We report, after the author permission, the abstract here below:

We suggest the idea, supported by concrete calculations within chiral models, that the critical endpoint of the phase diagram of Quantum Chromodynamics with three colors can be detected, by means of Lattice simulations of grand-canonical ensembles with a chiral chemical potential, $\mu_5$, conjugated to chiral charge density. In fact, we show that a continuation of the critical endpoint of the phase diagram of Quantum Chromodynamics at finite chemical potential, $\mu$, to a critical end point in the temperature-chiral chemical potential plane, is possible. This study paves the way of the mapping of the phases of Quantum Chromodynamics at finite $\mu$, by means of the phases of a fictitious theory in which $\mu$ is replaced by $\mu_5$.

Rajan Gupta (2011). Equation of State from Lattice QCD Calculations arXiv arXiv: 1104.0267v1

Philippe de Forcrand (2010). Simulating QCD at finite density PoS (LAT2009)010, 2009 arXiv: 1005.0539v2

M. Asakawa, & K. Yazaki (1989). Chiral restoration at finite density and temperature Nuclear Physics A, 504 (4), 668-684 DOI: 10.1016/0375-9474(89)90002-X

Z. Fodor, & S. D. Katz (2001). Lattice determination of the critical point of QCD at finite T and \mu JHEP 0203 (2002) 014 arXiv: hep-lat/0106002v2

Marco Ruggieri (2011). The Critical End Point of Quantum Chromodynamics Detected by Chirally
Imbalanced Quark Matter arXiv arXiv: 1103.6186v1

## CUDA: Upgrading to 3 Tflops

29/03/2011

When I was a graduate student I heard a lot about the wonderful performances of a Cray-1 parallel computer and the promises to explore unknown fields of knowledge with this unleashed power. This admirable machine reached a peak of 250 Mflops. Its near parent, Cray-2, performed at 1700 Mflops and for scientists this was indeed a new era in the help to attack difficult mathematical problems. But when you look at QCD all these seem just toys for a kindergarten and one is not even able to perform the simplest computations to extract meaningful physical results. So, physicists started to project very specialized machines to hope to improve the situation.

Today the situation is changed dramatically. The reason is that the increasing need for computation to perform complex tasks on a video output requires extended parallel computation capability for very simple mathematical tasks. But these mathematical tasks is all one needs to perform scientific computations. The flagship company in this area is Nvidia that produced CUDA for their graphic cards. This means that today one can have outperforming parallel computation on a desktop computer and we are talking of some Teraflops capability! All this at a very affordable cost. With few bucks you can have on your desktop a machine performing thousand times better than a legendary Cray machine. Now, a counterpart machine of a Cray-1 is a CUDA cluster breaking the barrier of Petaflops! Something people were dreaming of just a few years ago.  This means that you can do complex and meaningful QCD computations in your office, when you like, without the need to share CPU time with anybody and pushing your machine at its best. All this with costs that are not a concern anymore.

So, with this opportunity in sight, I jumped on this bandwagon and a few months ago I upgraded my desktop computer at home into a CUDA supercomputer. The first idea was just to buy old material from Ebay at very low cost to build on what already was on my machine. On 2008 the top of the GeForce Nvidia cards was a 9800 GX2. This card comes equipped with a couple of GPUs with 128 cores each one, 0.5 Gbyte of ram for each GPU and support for CUDA architecture 1.1. No double precision available. This option started to be present with cards having CUDA architecture 1.3 some time later. You can find a card of this on Ebay for about 100-120 euros. You will also need a proper motherboard. Indeed, again on 2008, Nvidia produced nForce 790i Ultra properly fitted for these aims. This card is fitted for a 3-way SLI configuration and as my readers know, I installed till 3 9800 GX2 cards on it. I have got this card on Ebay for a similar pricing as for the video cards. Also, before to start this adventure, I already had a 750 W Cooler Master power supply. It took no much time to have this hardware up and running reaching the considerable computational power of 2 Tflops in single precision, all this with hardware at least 3 years old! For the operating system I chose Windows 7 Ultimate 64 bit after an initial failure with Linux Ubuntu 64 bit.

There is a wide choice in the web for software to run for QCD. The most widespread is surely the MILC code. This code is written for a multi-processor environment and represents the effort of several people spanning several years of development. It is well written and rather well documented. From this code a lot of papers on lattice QCD have gone through the most relevant archival journals. Quite recently they started to port this code on CUDA GPUs following a trend common to all academia. Of course, for my aims, being a lone user of CUDA and having no much time for development, I had the no much attractive perspective to try the porting of this code on GPUs. But, in the same time when I upgraded my machine, Pedro Bicudo and Nuno Cardoso published their paper on arxiv (see here) and made promptly available their code for SU(2) QCD on CUDA GPUs. You can download their up-to-date code here (if you plan to use this code just let them know as they are very helpful). So, I ported this code, originally written for Linux, to Windows 7  and I have got it up and running obtaining a right output for a lattice till $56^4$ working just in single precision as, for this hardware configuration, no double precision was available. The execution time was acceptable to few seconds on GPUs and some more at the start of the program due to CPU and GPUs exchanges. So, already at this stage I am able to be productive at a professional level with lattice computations. Just a little complain is in order here. In the web it is very easy to find good code to perform lattice QCD but nothing is possible to find for post-processing of configurations. This code is as important as the former: Without computation of observables one can do nothing with configurations or whatever else lattice QCD yields on whatever powerful machine. So, I think it would be worthwhile to have both codes available to get spectra, propagators and so on starting by a standard configuration file independently on the program that generated it. Similarly, it appears almost impossible to get lattice code for computations on lattice scalar field theory (thank you a lot to Colin Morningstar for providing me code for 2+1dimensions!). This is a workhorse for people learning lattice computation and would be helpful, at least for pedagogical reasons, to make it available in the same way QCD code is. But now, I leave aside complains and go to the most interesting part of this post: The upgrading.

In these days I made another effort to improve my machine. The idea is to improve in performance like larger lattices and shorter execution times while reducing overheating and noise. Besides, the hardware I worked with was so old that the architecture did not make available double precision. So, I decided to buy a couple of GeForce 580 GTX. This is the top of the GeForce cards (590 GTX is a couple of 580 GTX on a single card) and yields 1.5 Tflops in single precision (9800 GX2 stopped at 1 Tflops in single precision). It has Fermi architecture (CUDA 2.0) and grants double precision at a possible performance of at least 0.5 Tflops. But as happens for all video cards, a model has several producers and these producers may decide to change something in performance. After some difficulties with the dealer, I was able to get a couple of high-performance MSI N580GTX Twin Frozr II/OC at a very convenient price. With respect to Nvidia original card, these come overclocked, with a proprietary cooler system that grants a temperature reduced of 19°C with respect to the original card. Besides, higher quality components were used. I received these cards yesterday and I have immediately installed them. In a few minutes Windows 7 installed the drivers. I recompiled my executable and finally I performed a successful computation to $66^4$ with the latest version of Nuno and Pedro code. Then, I checked the temperature of the card with Nvidia System Monitor and I saw a temperature of 60° C for each card and the cooler working at 106%. This was at least 24°C lesser than my 9800 GX2 cards! Execution times were at least reduced to a half on GPUs. This new configuration grants 3 Tflops in single precision and at least 1 Tflops in double precision. My present hardware configuration is the following:

So far, I have had no much time to experiment with the new hardware. I hope to say more to you in the near future. Just stay tuned!

Nuno Cardoso, & Pedro Bicudo (2010). SU(2) Lattice Gauge Theory Simulations on Fermi GPUs J.Comput.Phys.230:3998-4010,2011 arXiv: 1010.4834v2

## CUDA: Lattice QCD at your desktop

15/03/2011

As my readers know, I have built up a CUDA machine on my desktop for few bucks to have lattice QCD at my home. There are a couple of reasons to write this post and the most important of this is that Pedro Bicudo and Nuno Cardoso have got their paper published on an archival journal (see here). They produced a very good code to run on a CUDA machine to do SU(2) lattice QCD (download link) that I have got up and running on my computer. They are working on the SU(3) version that is almost ready. I hope to say about this in a very near future. Currently, I am porting MILC code for the computation of the gluon propagator on my machine from the configurations I am able to generate from Nuno and Pedro’s code. This MILC code fits quite well my needs and it is very well written. This task will take me some time and I have not too much of  it unfortunately.

Presently, Nuno and Pedro’s code runs perfectly on my machine (see my preceding post here). There was no problem in the code but I just missed a compiler option to make GPUs communicate through MPI library. Once I corrected this all runs like a charm. From a hardware standpoint, I was unable to get my machine perfectly working with three cards and the reason was just overheating. A chip of the motherboard ended below one of the video card resulting in an erratic behavior of the chipset. I have got a floppy disc seen by Windows 7 when I have none! So, I decided to work just with two cards and now the system works perfectly, is stable and Windows 7 sees always four GPUs.

Nuno sent to me an updated version of their code. I will make it run as soon as possible. Of course, I know that this porting will be as smooth as before and it will take just a few minutes of my time. I suggested to him to keep up to date their site with the latest version of the code as this is evolving with continuity.

Another important reason to write this post is that I am migrating from my old GeForce 9800 GX2 cards  to a couple of the latest GeForce 580 GTX with Fermi architecture. This will afford less than one thousand euros and I will be able to get 3 Tflops in single precision and 1 Tflops in double precision with more ram for each GPU. The ambition it to upgrade my CUDA machine to computational capabilities that, in 2007, made a breakthrough in the lattice studies of the propagators for Yang-Mills theory. The main idea is to have both the code for Yang-Mills and scalar field theories running under CUDA comparing their quantum behavior in the infrared limit, an idea pioneered by Rafael Frigori quite recently (see here). Rafael showed that my mapping theorem (see here and references therein) is true also in 2+1 dimensions through lattice computations.

The GeForce 580 GTX that I bought are from MSI  (see here). These cards are overclocked with respect to the standard product and come with a very convenient price. I should say that my hardware is already stable and I am able to produce software right now. But this upgrade will take me into the Fermi architecture opening up the possibility to get double precision on CUDA. I hope to report here in the near future about this new architecture and its advantages.

Nuno Cardoso, & Pedro Bicudo (2010). SU(2) Lattice Gauge Theory Simulations on Fermi GPUs J.Comput.Phys.230:3998-4010,2011 arXiv: 1010.4834v2

Rafael B. Frigori (2009). Screening masses in quenched (2+1)d Yang-Mills theory: universality from
dynamics? Nuclear Physics B, Volume 833, Issues 1-2, 1 July 2010, Pages 17-27 arXiv: 0912.2871v2

Marco Frasca (2010). Mapping theorem and Green functions in Yang-Mills theory PoS(FacesQCD)039, 2011 arXiv: 1011.3643v3

16/02/2011

As promised (see here) I am here to talk again about my CUDA machine. I have done the following upgrade:

• Added 4 GB of RAM and now I have 8 GB of DDR3 RAM clocked at 1333 MHz. This is the maximum allowed by my motherboard.
• Added the third 9800 GX2 graphics card. This is a XFX while the other twos that I have already installed are EVGA and Nvidia respectively. These three cards are not perfectly identical as the EVGA is overclocked by the manufacturer and, for all, the firmware could not be the same.

At the start of the upgrade process things were not so straight. Sometime BIOS complained at the boot about the position of the cards in the three PCI express 2.0 slots and the system did not start at all. But after that I have found the right combination in permuting the three cards, Windows 7 recognized all of them, latest Nvidia drivers installed as a charm and the Nvidia system monitor showed the physical situation of all the GPUs. Heat is a concern here as the video cards work at about 70 °C while the rest of the hardware is at about 50 °C. The box is always open and I intend to keep it so to reduce at a minimum the risk of overheating.

The main problem arose when I tried to run my CUDA applications from a command window. I have a simple program the just enumerate GPUs in the system and also the program for lattice computations of Pedro Bicudo and Nuno Cardoso can check the system to identify the exact set of resources to perform its work at best. Both the applications, that I recompiled on the upgraded platform, just saw a single GPU. It was impossible, at first, to get a meaningful behavior from the system. I thought that this could have been a hardware problem and contacted the XFX support for my motherboard. I bought my motherboard by second hand but I was able to register the product thanks to the seller that already did so. People at XFX were very helpful and fast in giving me an answer. The technician said to me essentially that the system should have to work and so he gave me some advices to identify possible problems. I would like to remember that a 9800 GX2 contains two graphics cards and so I have six GPUs to work with. I checked all the system again until I get the nice configuration above with Windows 7 seeing all the cards. Just a point remained unanswered: Why my CUDA applications did not see the right number of GPUs. This has been an old problem for Nvidia and was overcome with a driver revision long before I tried for myself. Currently, my driver is 266.58, the latest one. The solution come out unexpectedly. It has been enough to change a setting in the Performance menu of the Nvidia monitor for the use of multi-GPU and I have got back 5 GPUs instead of just 1. This is not six but I fear that I cannot do better. The applications now work fine. I recompiled them all and I have run successfully the lattice computation till a $76^4$ lattice in single precision! With these numbers I am already able to perform professional work in lattice computations at home.

Then I spent a few time to set the development environment through the debugger Parallel Nsight and Visual Studio 2008 for 64 bit applications. So far, I was able to generate the executable of the lattice simulation under VS 2008. My aim is to debug it to understand why some values become zero in the output and they should not. Also I would like to understand why the new version of the lattice simulation that Nuno sent to me does not seem to work properly on my platform. I have taken some time trying to configure Parallel Nsight for my machine. You will need at least two graphics cards to get it run and you have to activate PhysX on the Performance monitor of Nvidia on the card that will not run your application. This was a simple enough task as the online manual of the debugger is well written. Also, enclosed examples are absolutely useful. My next week-end will be spent to fine tuning all the matter and starting doing some work with the lattice simulation.

As far as I will go further with this activity I will inform you on my blog. If you want to initiate such an enterprise by yourself, feel free to get in touch with me to overcome difficulties and hurdles you will encounter. Surely, things proved to be not so much complicated as they appeared at the start.