## CUDA: Upgrading to 3 Tflops

29/03/2011

When I was a graduate student I heard a lot about the wonderful performances of a Cray-1 parallel computer and the promises to explore unknown fields of knowledge with this unleashed power. This admirable machine reached a peak of 250 Mflops. Its near parent, Cray-2, performed at 1700 Mflops and for scientists this was indeed a new era in the help to attack difficult mathematical problems. But when you look at QCD all these seem just toys for a kindergarten and one is not even able to perform the simplest computations to extract meaningful physical results. So, physicists started to project very specialized machines to hope to improve the situation.

Today the situation is changed dramatically. The reason is that the increasing need for computation to perform complex tasks on a video output requires extended parallel computation capability for very simple mathematical tasks. But these mathematical tasks is all one needs to perform scientific computations. The flagship company in this area is Nvidia that produced CUDA for their graphic cards. This means that today one can have outperforming parallel computation on a desktop computer and we are talking of some Teraflops capability! All this at a very affordable cost. With few bucks you can have on your desktop a machine performing thousand times better than a legendary Cray machine. Now, a counterpart machine of a Cray-1 is a CUDA cluster breaking the barrier of Petaflops! Something people were dreaming of just a few years ago.  This means that you can do complex and meaningful QCD computations in your office, when you like, without the need to share CPU time with anybody and pushing your machine at its best. All this with costs that are not a concern anymore.

So, with this opportunity in sight, I jumped on this bandwagon and a few months ago I upgraded my desktop computer at home into a CUDA supercomputer. The first idea was just to buy old material from Ebay at very low cost to build on what already was on my machine. On 2008 the top of the GeForce Nvidia cards was a 9800 GX2. This card comes equipped with a couple of GPUs with 128 cores each one, 0.5 Gbyte of ram for each GPU and support for CUDA architecture 1.1. No double precision available. This option started to be present with cards having CUDA architecture 1.3 some time later. You can find a card of this on Ebay for about 100-120 euros. You will also need a proper motherboard. Indeed, again on 2008, Nvidia produced nForce 790i Ultra properly fitted for these aims. This card is fitted for a 3-way SLI configuration and as my readers know, I installed till 3 9800 GX2 cards on it. I have got this card on Ebay for a similar pricing as for the video cards. Also, before to start this adventure, I already had a 750 W Cooler Master power supply. It took no much time to have this hardware up and running reaching the considerable computational power of 2 Tflops in single precision, all this with hardware at least 3 years old! For the operating system I chose Windows 7 Ultimate 64 bit after an initial failure with Linux Ubuntu 64 bit.

There is a wide choice in the web for software to run for QCD. The most widespread is surely the MILC code. This code is written for a multi-processor environment and represents the effort of several people spanning several years of development. It is well written and rather well documented. From this code a lot of papers on lattice QCD have gone through the most relevant archival journals. Quite recently they started to port this code on CUDA GPUs following a trend common to all academia. Of course, for my aims, being a lone user of CUDA and having no much time for development, I had the no much attractive perspective to try the porting of this code on GPUs. But, in the same time when I upgraded my machine, Pedro Bicudo and Nuno Cardoso published their paper on arxiv (see here) and made promptly available their code for SU(2) QCD on CUDA GPUs. You can download their up-to-date code here (if you plan to use this code just let them know as they are very helpful). So, I ported this code, originally written for Linux, to Windows 7  and I have got it up and running obtaining a right output for a lattice till $56^4$ working just in single precision as, for this hardware configuration, no double precision was available. The execution time was acceptable to few seconds on GPUs and some more at the start of the program due to CPU and GPUs exchanges. So, already at this stage I am able to be productive at a professional level with lattice computations. Just a little complain is in order here. In the web it is very easy to find good code to perform lattice QCD but nothing is possible to find for post-processing of configurations. This code is as important as the former: Without computation of observables one can do nothing with configurations or whatever else lattice QCD yields on whatever powerful machine. So, I think it would be worthwhile to have both codes available to get spectra, propagators and so on starting by a standard configuration file independently on the program that generated it. Similarly, it appears almost impossible to get lattice code for computations on lattice scalar field theory (thank you a lot to Colin Morningstar for providing me code for 2+1dimensions!). This is a workhorse for people learning lattice computation and would be helpful, at least for pedagogical reasons, to make it available in the same way QCD code is. But now, I leave aside complains and go to the most interesting part of this post: The upgrading.

In these days I made another effort to improve my machine. The idea is to improve in performance like larger lattices and shorter execution times while reducing overheating and noise. Besides, the hardware I worked with was so old that the architecture did not make available double precision. So, I decided to buy a couple of GeForce 580 GTX. This is the top of the GeForce cards (590 GTX is a couple of 580 GTX on a single card) and yields 1.5 Tflops in single precision (9800 GX2 stopped at 1 Tflops in single precision). It has Fermi architecture (CUDA 2.0) and grants double precision at a possible performance of at least 0.5 Tflops. But as happens for all video cards, a model has several producers and these producers may decide to change something in performance. After some difficulties with the dealer, I was able to get a couple of high-performance MSI N580GTX Twin Frozr II/OC at a very convenient price. With respect to Nvidia original card, these come overclocked, with a proprietary cooler system that grants a temperature reduced of 19°C with respect to the original card. Besides, higher quality components were used. I received these cards yesterday and I have immediately installed them. In a few minutes Windows 7 installed the drivers. I recompiled my executable and finally I performed a successful computation to $66^4$ with the latest version of Nuno and Pedro code. Then, I checked the temperature of the card with Nvidia System Monitor and I saw a temperature of 60° C for each card and the cooler working at 106%. This was at least 24°C lesser than my 9800 GX2 cards! Execution times were at least reduced to a half on GPUs. This new configuration grants 3 Tflops in single precision and at least 1 Tflops in double precision. My present hardware configuration is the following:

So far, I have had no much time to experiment with the new hardware. I hope to say more to you in the near future. Just stay tuned!

Nuno Cardoso, & Pedro Bicudo (2010). SU(2) Lattice Gauge Theory Simulations on Fermi GPUs J.Comput.Phys.230:3998-4010,2011 arXiv: 1010.4834v2

## Dispersive Wiki

26/03/2011

Since I was seventeen my great passion has been the solution of partial differential equations. I used an old book written by Italian mathematicians to face for the first time the technique of variable separation applied to the free Schrödinger equation. The article was written by Paolo Straneo, professor at University of Genova in the first part of the last century and Einstein’s friend, and from it I was exposed to quantum theories in a not too simpler way. At eighteen, some friends of mine, during my vacation in Camdridge, gave to me my first book of mathematics on PDEs: François Treves, Basic Linear Partial Differential Equations. You can find this book at low cost from Dover (see here).

Since then I have never given up with my passion with this fundamental part of mathematics and today I am a professional in this area of research.  As a professional in this area, important references come from the work of Terry Tao (see also his blog), the Fields medalist. Terry, together with Jim Colliander at University of Toronto, manage a Wiki, Dispersive Wiki, with the aim to collect all the knowledge about differential equations that are at the foundation of dispersive effects. Most of you have been exposed at their infancy with the wave equation. Well, this represents a very good starting point. On the other side, it would be helpful to add some contributions for Einstein or Yang-Mills equations. Indeed, Dispersive Wiki is open to all people that, like me, is addicted to PDEs and all matter around them.

I have had the chance to write some contributions to Dispersive Wiki. Currently, I am putting down some lines on Yang-Mills equations (I did it before but this was recognized as self-promotion… just look at the discussion there), Dirac-Klein-Gordon equations and other articles. I think it would be important to help Jim and Terry in their endeavor as PDEs are the bread and butter of our profession and to have on-line such a bookkeeping of results would be extremely  useful. Just take your time to give a look.

## Low-energy effective Yang-Mills theory

22/03/2011

As usual I read the daily from arxiv and often it happens to find very interesting papers. This is the case for a new paper from Kei-Ichi Kondo. Kondo was in Ghent last year (here his talk) and I have had the chance to meet him. His research is on very similar lines as mine. A relevant paper by him is about the derivation of the Nambu-Jona-Lasinio model from QCD (see here) with a similar hindsight I exposed in recent papers (see here and here). This new paper by Kondo presents a relevant attempt to derive a consistent low-energy effective Yang-Mills theory from the full Lagrangian. The idea is to decompose the gauge field into two components and integrate away the one that just contributes to the high-energy behavior of the theory. Kondo shows how a mass term could be introduced at the expenses of BRST symmetry breaking. This symmetry can be recovered at the cost of nilpotency. But this mass term is gauge invariant and gives rise to a meaningful propagator for the theory. Then, the computations show how Wilson’s area law is satisfied granting quark confinement and positivity reflection for the gluon propagator is violated granting gluon confinement too. The gluon propagator is then given in a Gribov-Stingl form

$D(p)=\frac{1+d_1p^2}{c_0+c_1p^2+c_2p^4}$

but this form is only recovered if the mass term is introduced in the original Yang-Mills Lagrangian as said above. It is interesting to note that, if this is the right propagator, a Nambu-Jona-Lasinio model could anyhow be derived taking the small momenta limit. A couple of observations are in order here. Firstly, Cucchieri and Mendes fits often recover this functional form (e.g. see here). Last but not least, this functional form is acausal but produces a confining potential increasing with the distance. But even if the mass term would be zero and no Gribov-Stingl form is obtained, Kondo shows that area law still holds and one has confinement yet. As a final conclusion, Kondo shows that his effective model describes confinement through a dual Meissner effect, a hypothesis that come out at the dawn of the studies in QCD.

This paper represents a fine piece of work. A point to be clarified is, given from other studies and lattice computations that gluon mass arises dynamically, how this approach should change and, most important, how the form of the propagator changes. I just suspect that my conclusions about this matter would be recovered.

Kei-Ichi Kondo (2011). A low-energy effective Yang-Mills theory for quark and gluon confinement arxiv arXiv: 1103.3829v1

Kei-Ichi Kondo (2010). Toward a first-principle derivation of confinement and
chiral-symmetry-breaking crossover transitions in QCD Phys. Rev. D 82, 065024 (2010) arXiv: 1005.0314v2

Marco Frasca (2010). Glueball spectrum and hadronic processes in low-energy QCD Nucl.Phys.Proc.Suppl.207-208:196-199,2010 arXiv: 1007.4479v2

Marco Frasca (2008). Infrared QCD International Journal of Modern Physics E 18, (2009) 693-703 arXiv: 0803.0319v5

Attilio Cucchieri, & Tereza Mendes (2009). Landau-gauge propagators in Yang-Mills theories at beta = 0: massive
solution versus conformal scaling Phys.Rev.D81:016005,2010 arXiv: 0904.4033v2

## Sidney Coleman’s QFT lectures

21/03/2011

This post is just to point out to my readers that the lectures of Sidney Coleman on QFT are now available in TeX and pdf format. I have taken this information from Lubos’ site. The link for the full pdf is this. For this excellent work the person to be grateful is Bryan Chen a former Lubos’ student. These lectures give an idea of the greatness of Coleman also as a teacher.

For the readers of my blog pursuing active research in my same areas a relevant paper by him is there are no classical glueballs. Coleman’s conclusion goes like

…there are no finite-energy non-singular solutions of classical Yang-Mills theory in four-dimensional Minkowski space that do not radiate energy out to spatial infinity…

and this agrees quite well with my exact solutions that do not seem to have finite energy (see here) and so, Coleman’s theorem is evaded. Indeed, if you want to have a field to generate a mass, you will need either a finite volume or a running coupling.

Coleman, S. (1977). There are no classical glueballs Communications in Mathematical Physics, 55 (2), 113-116 DOI: 10.1007/BF01626513

Marco Frasca (2009). Exact solutions of classical scalar field equations arxiv arXiv: 0907.4053v2

## CUDA: Lattice QCD at your desktop

15/03/2011

As my readers know, I have built up a CUDA machine on my desktop for few bucks to have lattice QCD at my home. There are a couple of reasons to write this post and the most important of this is that Pedro Bicudo and Nuno Cardoso have got their paper published on an archival journal (see here). They produced a very good code to run on a CUDA machine to do SU(2) lattice QCD (download link) that I have got up and running on my computer. They are working on the SU(3) version that is almost ready. I hope to say about this in a very near future. Currently, I am porting MILC code for the computation of the gluon propagator on my machine from the configurations I am able to generate from Nuno and Pedro’s code. This MILC code fits quite well my needs and it is very well written. This task will take me some time and I have not too much of  it unfortunately.

Presently, Nuno and Pedro’s code runs perfectly on my machine (see my preceding post here). There was no problem in the code but I just missed a compiler option to make GPUs communicate through MPI library. Once I corrected this all runs like a charm. From a hardware standpoint, I was unable to get my machine perfectly working with three cards and the reason was just overheating. A chip of the motherboard ended below one of the video card resulting in an erratic behavior of the chipset. I have got a floppy disc seen by Windows 7 when I have none! So, I decided to work just with two cards and now the system works perfectly, is stable and Windows 7 sees always four GPUs.

Nuno sent to me an updated version of their code. I will make it run as soon as possible. Of course, I know that this porting will be as smooth as before and it will take just a few minutes of my time. I suggested to him to keep up to date their site with the latest version of the code as this is evolving with continuity.

Another important reason to write this post is that I am migrating from my old GeForce 9800 GX2 cards  to a couple of the latest GeForce 580 GTX with Fermi architecture. This will afford less than one thousand euros and I will be able to get 3 Tflops in single precision and 1 Tflops in double precision with more ram for each GPU. The ambition it to upgrade my CUDA machine to computational capabilities that, in 2007, made a breakthrough in the lattice studies of the propagators for Yang-Mills theory. The main idea is to have both the code for Yang-Mills and scalar field theories running under CUDA comparing their quantum behavior in the infrared limit, an idea pioneered by Rafael Frigori quite recently (see here). Rafael showed that my mapping theorem (see here and references therein) is true also in 2+1 dimensions through lattice computations.

The GeForce 580 GTX that I bought are from MSI  (see here). These cards are overclocked with respect to the standard product and come with a very convenient price. I should say that my hardware is already stable and I am able to produce software right now. But this upgrade will take me into the Fermi architecture opening up the possibility to get double precision on CUDA. I hope to report here in the near future about this new architecture and its advantages.

Nuno Cardoso, & Pedro Bicudo (2010). SU(2) Lattice Gauge Theory Simulations on Fermi GPUs J.Comput.Phys.230:3998-4010,2011 arXiv: 1010.4834v2

Rafael B. Frigori (2009). Screening masses in quenched (2+1)d Yang-Mills theory: universality from
dynamics? Nuclear Physics B, Volume 833, Issues 1-2, 1 July 2010, Pages 17-27 arXiv: 0912.2871v2

Marco Frasca (2010). Mapping theorem and Green functions in Yang-Mills theory PoS(FacesQCD)039, 2011 arXiv: 1011.3643v3

## No scaling solution with massive gluons

09/03/2011

Some time ago, while I was just at the beginning of my current understanding of low-energy Yang-Mills theory, I wrote to Christian Fischer to know if from the scaling solution, the one with the gluon propagator going to zero lowering momenta and the ghost propagator running to infinity faster than the free particle in the same limit,  a mass gap could be derived. Christian has always been very kind to answer my requests for clarification and did the same also for this so particular question telling to me that this indeed was not possible. This is a rather disappointing truth as we are accustomed with the idea that short ranged forces need some kind of massive carriers. But physics taught that a first intuition could be wrong and so I decided not to take this as an argument against the scaling solution. Since today.

Looking at arxiv, I follow with a lot of interest the works of the group of people collaborating with Philippe Boucaud.   They are supporting the decoupling solution as this is coming out from their numerical computations through the Dyson-Schwinger equations. A person working with them, Jose Rodríguez-Quintero, is producing several interesting results in this direction and the most recent ones appear really striking (see here and here). The question Jose is asking is when and how does a scaling solution appear in solving the Dyson-Schwinger equations? I would like to remember that this kind of solution was found with a truncation technique from these equations and so it is really important to understand better its emerging. Jose solves the equations with a method recently devised by Joannis Papavassiliou and Daniele Binosi (see here) to get a sensible truncation of the Dyson-Schwinger hierarchy of equations. What is different in Jose’s approach is to try an ansatz with a massive propagator (this just means Yukawa-like) and to see under what conditions a scaling solution can emerge. A quite shocking result is that there exists a critical value of the strong coupling that can produce it but at the price to have the Schwinger-Dyson equations no more converging toward a consistent solution with a massive propagator and the scaling solution representing just an unattainable limiting case. So, scaling solution implies no mass gap as already Christian told me a few years ago.

The point is that now we have a lot of evidence that the massive solution is the right one and there is no physical reason whatsoever to presume that the scaling solution should be the true solution at the critical scaling found by Jose. So, all this mounting evidence is there to say that the old idea of Hideki Yukawa is working yet:  Massive carriers imply limited range forces.

J. Rodríguez-Quintero (2011). The scaling infrared DSE solution as a critical end-point for the family
of decoupling ones arxiv arXiv: 1103.0904v1

J. Rodríguez-Quintero (2010). On the massive gluon propagator, the PT-BFM scheme and the low-momentum
behaviour of decoupling and scaling DSE solutions JHEP 1101:105,2011 arXiv: 1005.4598v2

Daniele Binosi, & Joannis Papavassiliou (2007). Gauge-invariant truncation scheme for the Schwinger-Dyson equations of
QCD Phys.Rev.D77:061702,2008 arXiv: 0712.2707v1

## Chiral condensates in a magnetic field: A collaboration

08/03/2011

So far, it is more than twenty years that I publish in refereed journals and, notwithstanding a lot of exchange with my colleagues, I have never had the chance to work in a collaboration.  The opportunity come thanks to Marco Ruggieri (see here). Me and Marco met in Gent at the Conference “The Many faces of QCD” (see here, here and here). We have had a lot of good time and discussed a lot about physics. I remember a very nice moment discussing with Attilio Cucchieri and Tereza Mendes in a pub, with a good beer, about the history that was going to form on the question of the propagators for Yang-Mills theory in a Landau gauge. About a month ago, Marco wrote to me about his new work in progress. He was managing to analyze the behavior of QCD condensates in a magnetic field through a couple of models: The linear sigma model and the Nambu-Jona-Lasinio model. The formalism for doing this was already known in literature due to the works of Ritus, Leung and Wang (see below) that analyzed the solutions of the Dirac equations in a constant magnetic field giving also the propagator. In our paper we introduce the constant magnetic field into the given phenomenological models through a minimal coupling. It is interesting to note that, while the sigma model is renormalizable, Nambu-Jona-Lasinio model is not and displays explicitly a dependence on a cut-off. This is not a concern here as this cut-off in QCD has a physical meaning as one can already see in asymptotic freedom studies. The motivation for this study was mainly a lattice analysis of this kind of physical situation (see here). The point is that some kind of condensates can form only with the presence of the external magnetic field. We were able to recover the values of magnetic susceptibility and the dependence of the chiral condensate on the magnetic field in the limit of small and large fields. Besides, we obtained an evaluation of the magnetic moment. The agreement with lattice computations is fairly good.

What I have learned from this work is that the use of phenomenological models, particularly their choice, can entail some difficulties with the expected behavior of QCD. First of all, the sigma model and the Nambu-Jona-Lasinio model are not so different: One can be obtained from the other through bosonization techniques. But while the latter cannot be renormalized, implying a contact interaction and a dimensional coupling , the former can. A curious result I obtained working on this paper with Marco is that the Yukawa model, written down as a  non self-interacting scalar field interacting with a massless Dirac field, can be easily transformed into a Nambu-Jona-Lasinio model giving rise to chiral symmetry breaking! If Hideki Yukawa would have had this known, his breakthrough would have been enormous. On the other side, a sigma model is always renormalizable and this implies that any final result of a computation from it is independent on any cut-off used to regularize the theory. This is not what is seen in QCD where a physical scale depending on energy emerges naturally by integrating the equations of motion as already said above. Besides, condensates do depend explicitly on such a cut-off and this means that to regularize a sigma model to describe QCD at very low-energies implies a deviation from physical results. Last but not least, scalar models are trivial at low energies but we know that this is not the case for QCD that has the running coupling reaching a non-trivial fixed point in the infrared limit. For a Nambu-Jona-Lasinio model this is not a concern as it holds when  the infrared limit is already reached with a fixed value of the strong force coupling. My personal view is that one should always use a Nambu-Jona-Lasinio model and reduces to a sigma model after a bosonization procedure so to fix all the parameters of the theory with the physical ones. In this sense, the renormalizability of the sigma model will be helpful to correctly represent mesons and all the low-energy phenomenology. The reason for this is quite simple: Nambu-Jona-Lasinio model is the right low-energy limit of QCD.

Marco Frasca, & Marco Ruggieri (2011). Magnetic Susceptibility of the Quark Condensate and Polarization from
Chiral Models arxiv arXiv: 1103.1194v1

RITUS, V. (1972). Radiative corrections in quantum electrodynamics with intense field and their analytical properties Annals of Physics, 69 (2), 555-582 DOI: 10.1016/0003-4916(72)90191-1

C. N. Leung, & S. -Y. Wang (2005). Gauge independent approach to chiral symmetry breaking in a strong
magnetic field Nucl.Phys.B747:266-293,2006 arXiv: hep-ph/0510066v3

P. V. Buividovich, M. N. Chernodub, E. V. Luschevskaya, & M. I. Polikarpov (2009). Chiral magnetization of non-Abelian vacuum: a lattice study Nucl.Phys.B826:313-327,2010 arXiv: 0906.0488v2