## That strange behavior of supersymmetry…

07/12/2013

I am a careful reader of scientific literature and an avid searcher for already published material in peer reviewed journals. Of course, arxiv is essential to accomplish this task and to satisfy my needs for reading. In these days, I am working on Dyson-Schwinger equations. I have written on this a paper (see here) a few years ago but this work is in strong need to be revised. Maybe, some of these days I will take the challenge. Googling around and looking for the Dyson-Schwinger equations applied to the well-known supersymmetric model due to Wess and Zumino, I have uncovered a very exciting track of research that uses Dyson-Schwinger equations to produce exact results in quantum field theory. The paper I have got was authored by Marc Bellon, Gustavo Lozano and Fidel Schaposnik and can be found here. These authors get the Dyson-Schwinger equations for the Wess-Zumino model at one loop and manage to compute the self-energies of the involved fields: A scalar, a fermion and an auxiliary bosonic field. Their equations are yielded for three different self-energies, different for each field. Self-energies are essential in quantum field theory as they introduce corrections to masses in a propagator and so enters into the physical part of an object that is not an observable.

Now, if you are in a symmetric theory like the Wess-Zumino model, such a symmetry, if it is not broken, will yield equal masses to all the components of the multiplet entering into the theory. This means that if you start with the assumption that in this case all the self-energies are equal, you are doing a consistent approximation. This is what Bellon, Lozano and Schaposnik just did. They assumed from the start that all the self-energies are equal for the Dyson Schwinger equations they get and go on with their computations. This choice leaves an open question: What if do I choose different self-energies from the start? Will the Dyson-Schwiner equations drive the solution toward the symmetric one?

This question is really interesting as the model considered is not exactly the one that Witten analysed in his famous paper  on 1982 on breaking of a supersymmetry (you can download his paper here). Supersymmetric model generates non-linear terms and could be amenable to spontaneous symmetry breaking, provided the Witten index has the proper values. The question I asked is strongly related to the idea of a supersymmetry breaking at the bootstrap: Supersymmetry is responsible for its breaking.

So, I managed to numerically solve Dyson-Schwinger equations for the Wess-Zumino model as yielded by Bellon, Lozano and Schaposnik and presented the results in a paper (see here). If you solve them assuming from the start all the self-energies are equal you get the following figure for coupling running from 0.25 to 100 (weak to strong):

It does not matter the way you modify your parameters in the Dyson-Schwinger equations. Choosing them all equal from the start makes them equal forever. This is a consistent choice and this solution exists. But now, try to choose all different self-energies. You will get the following figure for the same couplings:

This is really nice. You see that exist also solutions with all different self-energies and supersymmetry may be broken in this model. This kind of solutions has been missed by the authors. What one can see here is that supersymmetry is preserved for small couplings, even if we started with all different self-energies, but is broken as the coupling becomes stronger. This result is really striking and unexpected. It is in agreement with the results presented here.

I hope to extend this analysis to more mundane theories to analyse behaviours that are currently discussed in literature but never checked for. For these aims there are very powerful tools developed for Mathematica by Markus Huber, Jens Braun and Mario Mitter to get and numerically solve Dyson-Schwinger equations: DoFun anc CrasyDSE (thanks to Markus Huber for help). I suggest to play with them for numerical explorations.

Marc Bellon, Gustavo S. Lozano, & Fidel A. Schaposnik (2007). Higher loop renormalization of a supersymmetric field theory Phys.Lett.B650:293-297,2007 arXiv: hep-th/0703185v1

Edward Witten (1982). Constraints on Supersymmetry Breaking Nuclear Physics B, 202, 253-316 DOI: 10.1016/0550-3213(82)90071-2

Marco Frasca (2013). Numerical study of the Dyson-Schwinger equations for the Wess-Zumino
model arXiv arXiv: 1311.7376v1

Marco Frasca (2012). Chiral Wess-Zumino model and breaking of supersymmetry arXiv arXiv: 1211.1039v1

Markus Q. Huber, & Jens Braun (2011). Algorithmic derivation of functional renormalization group equations and
Dyson-Schwinger equations Computer Physics Communications, 183 (6), 1290-1320 arXiv: 1102.5307v2

Markus Q. Huber, & Mario Mitter (2011). CrasyDSE: A framework for solving Dyson-Schwinger equations arXiv arXiv: 1112.5622v2

## Fooling with mathematicians

28/02/2013

I am still working with stochastic processes and, as my readers know, I have proposed a new view of quantum mechanics assuming that at the square root of a Wiener process can be attached a meaning (see here and here). I was able to generate it through a numerical code. A square root of a number can always be taken, irrespective of any deep and beautiful mathematical analysis. The reason is that this is something really new and deserves a different approach much in the same way it happened to the Dirac’s delta that initially met with skepticism from the mathematical community (simply it did not make sense with the knowledge of the time). Here I give you some Matlab code if you want to try by yourselves:

nstep = 500000;
dt = 50;
t=0:dt/nstep:dt;
B = normrnd(0,sqrt(dt/nstep),1,nstep);
dB = cumsum(B);
% Square root of the Brownian motion
dB05=(dB).^(1/2);

Nothing can prevent you from taking the square root of  a number as is a Brownian displacement and so all this has a well sound meaning numerically. The point is just to understand how to give this a full mathematical meaning. The wrong approach in this case is just to throw all away claiming all this does not exist. This is exactly the behavior I met from Didier Piau. Of course, Didier is a good mathematician but simply refuses to accept the possibility that such concepts can have a meaning at all based on what has been so far coded in the area of stochastic processes. This notwithstanding that they can be easily computed on your personal computer at home.

But this saga is not over yet. This time I was trying to compute the cubic root of a Wiener process and I posted this at Mathematics Stackexchange. I put this question with  the simple idea in mind to consider a stochastic process with a random mean and I did not realize that I was provoking a small crisis again. This time the question is the existence of the process ${\rm sign}(dW)$. Didier Piau immediately wrote down that it does not exist. Again I give here the Matlab code that computes it very easily:

nstep = 500000;
dt = 50;
t=0:dt/nstep:dt;
B = normrnd(0,sqrt(dt/nstep),1,nstep);
dB = cumsum(B);
% Sign and absolute value of a Wiener process
dS = sign(dB);
dA = dB./dS;

Didier Piau and a colleague of him just complain on the Matlab way the sign operation is performed. My view is that it is all legal as Matlab takes + or – depending on the sign of the displacement, a thing that can be made by hand and that does not imply anything exotic.  What it is exotic here it the strong opposition this evidence meets notwithstanding is easily understandable by everybody and, of course, easily computable on a tabletop computer. The expected distribution for the signs of Brownian displacements is a Bernoulli with p=1/2. Here is the histogram from the above code

This has mean 0 and variance 1 as it should for $N=\pm 1$ and $p=\frac{1}{2}$ but this can be verified after some Montecarlo runs. This is in agreement with what I discussed here at Mathematics Stackexchange as a displacement in a Brownian motion is a physics increment or decrement of the moving particle and has a sign that can be managed statistically. My attempt to compare all this to the case of Dirac’s delta turns out into a complain of overstatement as delta was really useful and my approach is not (but when Dirac put forward his idea this was just airy-fairy for the time). Of course, a reformulation of quantum mechanics would be a rather formidable support to all this but this mathematician does not seem to realize it.

So, in the end, I am somewhat surprised by the behavior of the community against novelties. I can understand skepticism, it belongs to our profession, but for facing new concepts that can be easily checked numerically to exist I would prefer a more constructive behavior trying to understand rather than an immediate dismissal. It appears like history of science never taught anything leaving us with a boring repetition of stereotyped reactions to something that instead would be worthwhile further consideration. Meanwhile, I hope my readers will enjoy playing around with these new computations using some exotic mathematical operations on a stochastic process.

Marco Frasca (2012). Quantum mechanics is the square root of a stochastic process arXiv arXiv: 1201.5091v2

## Back to CUDA

11/02/2013

It is about two years ago when I wrote my last post about CUDA technology by NVIDIA (see here). At that time I added two new graphic cards to my PC, being on the verge to reach 3 Tflops in single precision for lattice computations.  Indeed, I have had an unlucky turn of events and these cards went back to the seller as they were not working properly and I was completely refunded. Meantime, also the motherboard failed and the hardware was largely changed  and so, I have been for a lot of time without the opportunity to work with CUDA and performing intensive computations as I planned. As it is well-known, one can find a lot of software exploiting this excellent technology provided by NVIDIA and, during these years, it has been spreading largely, both in academia and industry, making life of researchers a lot easier. Personally, I am using it also at my workplace and it is really exciting to have such a computational capability at your hand at a really affordable price.

Now, I am newly able to equip my personal computer at home with a powerful Tesla card. Some of these cards are currently dismissed as they are at the end of activity, due to upgrades of more modern ones, and so can be found at a really small price in bid sites like ebay. So, I bought a Tesla M1060 for about 200 euros. As the name says, this card has not been conceived for a personal computer but rather for servers produced by some OEMs. This can also be realized when we look at the card and see a passive cooler. This means that the card should have a proper physical dimension to enter into a server while the active dissipation through fans should be eventually provided by the server itself. Indeed, I added an 80mm Enermax fan to my chassis (also Enermax Enlobal)  to be granted that the motherboard temperature does not reach too high values. My motherboard is an ASUS P8P67 Deluxe. This is  a very good card, as usual for ASUS, providing three PCIe 2.0 slots and, in principle, one can add up to three video cards together. But if you have a couple of NVIDIA cards in SLI configuration, the slots work at x8. A single video card will work at x16.  Of course, if you plan to work with these configurations, you will need a proper PSU. I have a Cooler Master Silent Pro Gold 1000 W and I am well beyond my needs. This is what remains from my preceding configuration and is performing really well. I have also changed my CPU being this now an Intel i3-2125 with two cores at 3.30 GHz and 3Mb Cache. Finally, I added  16 Gb of Corsair Vengeance DDR3 RAM.

The installation of the card went really smooth and I have got it up and running in a few minutes on Windows 8 Pro 64 Bit,  after the installation of the proper drivers. I checked with Matlab 2011b and PGI compilers with CUDA Toolkit 5.0 properly installed. All worked fine. I would like to spend a few words about PGI compilers that are realized by The Portland Group. I have got a trial license at home and tested them while at my workplace we have a fully working license. These compilers make the realization of accelerated CUDA code absolutely easy. All you need is to insert into your C or Fortran code some preprocessing directives. I have executed some performance tests and the gain is really impressive without ever writing a single line of CUDA code. These compilers can be easily introduced into Matlab to yield mex-files or S-functions even if they are not yet supported by Mathworks (they should!) and also this I have verified without too much difficulty both for C and Fortran.

Finally, I would like to give you an idea on the way I will use CUDA technology for my aims. What I am doing right now is porting some good code for the scalar field and I would like to use it in the limit of large self-interaction to derive the spectrum of the theory. It is well-known that if you take the limit of the self-interaction going to infinity you recover the Ising model. But I would like to see what happens with intermediate but large values as I was not able to get any hint from literature on this, notwithstanding this is the workhorse for any people doing lattice computations. What seems to matter today is to show triviality at four dimensions, a well-acquired evidence. As soon as the accelerate code will run properly, I plan to share it here as it is very easy to get good code to do lattice QCD but it is very difficult to get good code for scalar field theory as well. Stay tuned!

## Turing machine and Landauer limit

04/06/2012

My inactivity period was due to a lack of real news around the World. But I was not inactive at all. My friend Alfonso Farina presented to me another question that occupied my mind for the last weeks: What is the energy cost for computation? The first name that comes to mind in such a case is Rolf Landauer that, on 1961, wrote a fundamental paper on this question. The main conclusion drawn by Landauer was that at each operation on a bit there is an entropy cost of $K\ln 2$ being $K$ the Boltzmann constant. This means that, it you are operating at a temperature $T$ there will be heat emission for $KT\ln 2$ and this is the Landauer limit. This idea stems from the fact that information is not some abstract entity living in hyperuranium but just to stay in the real world it needs a physical support. And wherever there is a physical support thermodynamics and its second principle is there at work. Otherwise, we can use information to evade the second principle and build our preferred perpetual motion. As Charles Bennett proved, Maxwell demon cannot work due to Landauer limit (for a review see here).

Recently, a group of researchers was able to show, by a smart experimental setup, that Landauer’s principle is indeed true (see here). This makes mandatory to show theoretically that Landauer’s principle is indeed a theorem and not just a conjecture.

To accomplish this task, we would need a conceptual tool that can map computation theory to physics. This tool exists since a long time and was devised by Alan Turing: The Turing machine. A Turing machine is a thought computational device aimed to show that there exist mathematical functions that cannot have a finite time computation, a question asked by Hilbert on 1928 (see here). A Turing machine can compute whatever a real machine can (this is the content of the Church-Turing thesis). There exist some different kinds of Turing machines but all are able to perform the same computations. The main difference relies on the complexity of the computation itself rather than its realization. This conceptual tool is now an everyday tool in computation theory to perform demonstrations of fundamental results. So, if we are able to remap a Turing machine on a physical system and determine its entropy we can move the Landauer’s principle from a conjecture to a theorem status.

In my paper that appeared today on arXiv (see here) I was able to show that such a map exists. But how can we visualize it? So, consider a Turing machine with two symbols and a probabilistic rule to move it. The probabilistic rule is just coded on another tape that can be consulted to take the next move. This represents a two-symbol probabilistic Turing machine. In physics we have such a system and is very well-known: The Ising model. As stated above, a probabilistic Turing machine can perform any kind of computations a deterministic Turing machine can. What is changing is the complexity of the computation itself (see here). Indeed, a sequence of symbols of the tape in the Turing machine is exactly a configuration of a one-dimensional Ising model. This model has no critical temperature and any configuration is a plausible outcome of a computation of a Turing machine or its input. What we need is a proper time evolution that sets in the equilibrium state, representing the end of computation.

Time evolution of the one-dimensional Ising model has been formulated by Roy Glauber on 1963. Glauber model has a master equation that converge to an equilibrium as time evolves with a Boltzmann distribution. The entropy of the model at the end of its evolution is well-known and has the limit value for the entropy $K\ln 2$ as it should when a single particle is considered but this is just a lower limit. So, we can conclude that the operations of our Turing machine will involve a quantity of emitted heat in agreement with Landauer’s principle and this is now a theorem. What is interesting to note is that the emitted heat at room temperature for a petabit of data is just about a millionth of Joule, a very small amount. This makes managing information convenient yet and cybercrime still easy to perform.

Landauer, R. (1961). Irreversibility and Heat Generation in the Computing Process IBM Journal of Research and Development, 5 (3), 183-191 DOI: 10.1147/rd.53.0183

Bennett, C. (2003). Notes on Landauer’s principle, reversible computation, and Maxwell’s Demon Studies In History and Philosophy of Science Part B: Studies In History and Philosophy of Modern Physics, 34 (3), 501-510 DOI: 10.1016/S1355-2198(03)00039-X

Bérut, A., Arakelyan, A., Petrosyan, A., Ciliberto, S., Dillenschneider, R., & Lutz, E. (2012). Experimental verification of Landauer’s principle linking information and thermodynamics Nature, 483 (7388), 187-189 DOI: 10.1038/nature10872

Marco Frasca (2012). Probabilistic Turing Machine and Landauer Limit arXiv arXiv: 1206.0207v1

## Steve Jobs, 1955-2011

06/10/2011

You will change the other World too…

## A physics software repository

12/04/2011

Scientific publishing has undergone a significant revolution after Paul Ginsparg introduced arXiv. Before this great idea, people doing research used to send preprints of their works to some selected colleagues for comments. This kind of habit was costly, time consuming and reached very few people around the World until the paper eventually went through some archival journal. Ginsparg’s idea was to use the web to accomplish this task making widely known papers well before publication to all the community. This changed the way we do research as it is common practice to put a paper on arXiv before submission to journals. This has had the effect to downgrade the relevance of these journal for scientific communication. This is so true that Perelman’s papers on Poincaré conjecture never appeared on literature, they are just on arXiv, but the results were anyhow generally acknowledged by the scientific community. This represents an extraordinary achievement for arXiv and shows unequivocally the greatness of Ginsparg’s idea.

Of course, research is not just writing articles and get them published somewhere. An example is physics where a lot of research activity relies on writing computer programs. This can happen on a lot of platforms as Windows, Mac, Linux or machines performing parallel computations. Generally, these programs are relegated to some limited use to a small group of researchers and other people around the World, having similar problems, could be in need of it but are forced to reinvent the wheel. This happens again and again and often one relies on the kindness of colleagues that in some cases could have not the good will to give away the software. This situation is very similar to the one encountered before arXiv come into operation. So, my proposal is quite simple: People in the scientific community having the good will to share their software should be stimulated to do so through a repository that fits the bill. This could be easily obtained by extending arXiv itself that already contains several papers presenting software written by our colleagues that, aiming to share, just put there a link. But having a repository, it could be easier to maintain versions as already happens to paper and there would be no need to create an ad hoc site that could be lost in the course of time.

I do not know if this proposal will meet with success but it is my personal conviction that a lot of people around the World has this need and this could be easily realized by the popularity of certain links to download programs for doing computations in physics. This need is increasingly growing thanks to parallel computation made available to desktop computers that today is a reality. I look forward to hear news about this.

## CUDA: Upgrading to 3 Tflops

29/03/2011

When I was a graduate student I heard a lot about the wonderful performances of a Cray-1 parallel computer and the promises to explore unknown fields of knowledge with this unleashed power. This admirable machine reached a peak of 250 Mflops. Its near parent, Cray-2, performed at 1700 Mflops and for scientists this was indeed a new era in the help to attack difficult mathematical problems. But when you look at QCD all these seem just toys for a kindergarten and one is not even able to perform the simplest computations to extract meaningful physical results. So, physicists started to project very specialized machines to hope to improve the situation.

Today the situation is changed dramatically. The reason is that the increasing need for computation to perform complex tasks on a video output requires extended parallel computation capability for very simple mathematical tasks. But these mathematical tasks is all one needs to perform scientific computations. The flagship company in this area is Nvidia that produced CUDA for their graphic cards. This means that today one can have outperforming parallel computation on a desktop computer and we are talking of some Teraflops capability! All this at a very affordable cost. With few bucks you can have on your desktop a machine performing thousand times better than a legendary Cray machine. Now, a counterpart machine of a Cray-1 is a CUDA cluster breaking the barrier of Petaflops! Something people were dreaming of just a few years ago.  This means that you can do complex and meaningful QCD computations in your office, when you like, without the need to share CPU time with anybody and pushing your machine at its best. All this with costs that are not a concern anymore.

So, with this opportunity in sight, I jumped on this bandwagon and a few months ago I upgraded my desktop computer at home into a CUDA supercomputer. The first idea was just to buy old material from Ebay at very low cost to build on what already was on my machine. On 2008 the top of the GeForce Nvidia cards was a 9800 GX2. This card comes equipped with a couple of GPUs with 128 cores each one, 0.5 Gbyte of ram for each GPU and support for CUDA architecture 1.1. No double precision available. This option started to be present with cards having CUDA architecture 1.3 some time later. You can find a card of this on Ebay for about 100-120 euros. You will also need a proper motherboard. Indeed, again on 2008, Nvidia produced nForce 790i Ultra properly fitted for these aims. This card is fitted for a 3-way SLI configuration and as my readers know, I installed till 3 9800 GX2 cards on it. I have got this card on Ebay for a similar pricing as for the video cards. Also, before to start this adventure, I already had a 750 W Cooler Master power supply. It took no much time to have this hardware up and running reaching the considerable computational power of 2 Tflops in single precision, all this with hardware at least 3 years old! For the operating system I chose Windows 7 Ultimate 64 bit after an initial failure with Linux Ubuntu 64 bit.

There is a wide choice in the web for software to run for QCD. The most widespread is surely the MILC code. This code is written for a multi-processor environment and represents the effort of several people spanning several years of development. It is well written and rather well documented. From this code a lot of papers on lattice QCD have gone through the most relevant archival journals. Quite recently they started to port this code on CUDA GPUs following a trend common to all academia. Of course, for my aims, being a lone user of CUDA and having no much time for development, I had the no much attractive perspective to try the porting of this code on GPUs. But, in the same time when I upgraded my machine, Pedro Bicudo and Nuno Cardoso published their paper on arxiv (see here) and made promptly available their code for SU(2) QCD on CUDA GPUs. You can download their up-to-date code here (if you plan to use this code just let them know as they are very helpful). So, I ported this code, originally written for Linux, to Windows 7  and I have got it up and running obtaining a right output for a lattice till $56^4$ working just in single precision as, for this hardware configuration, no double precision was available. The execution time was acceptable to few seconds on GPUs and some more at the start of the program due to CPU and GPUs exchanges. So, already at this stage I am able to be productive at a professional level with lattice computations. Just a little complain is in order here. In the web it is very easy to find good code to perform lattice QCD but nothing is possible to find for post-processing of configurations. This code is as important as the former: Without computation of observables one can do nothing with configurations or whatever else lattice QCD yields on whatever powerful machine. So, I think it would be worthwhile to have both codes available to get spectra, propagators and so on starting by a standard configuration file independently on the program that generated it. Similarly, it appears almost impossible to get lattice code for computations on lattice scalar field theory (thank you a lot to Colin Morningstar for providing me code for 2+1dimensions!). This is a workhorse for people learning lattice computation and would be helpful, at least for pedagogical reasons, to make it available in the same way QCD code is. But now, I leave aside complains and go to the most interesting part of this post: The upgrading.

In these days I made another effort to improve my machine. The idea is to improve in performance like larger lattices and shorter execution times while reducing overheating and noise. Besides, the hardware I worked with was so old that the architecture did not make available double precision. So, I decided to buy a couple of GeForce 580 GTX. This is the top of the GeForce cards (590 GTX is a couple of 580 GTX on a single card) and yields 1.5 Tflops in single precision (9800 GX2 stopped at 1 Tflops in single precision). It has Fermi architecture (CUDA 2.0) and grants double precision at a possible performance of at least 0.5 Tflops. But as happens for all video cards, a model has several producers and these producers may decide to change something in performance. After some difficulties with the dealer, I was able to get a couple of high-performance MSI N580GTX Twin Frozr II/OC at a very convenient price. With respect to Nvidia original card, these come overclocked, with a proprietary cooler system that grants a temperature reduced of 19°C with respect to the original card. Besides, higher quality components were used. I received these cards yesterday and I have immediately installed them. In a few minutes Windows 7 installed the drivers. I recompiled my executable and finally I performed a successful computation to $66^4$ with the latest version of Nuno and Pedro code. Then, I checked the temperature of the card with Nvidia System Monitor and I saw a temperature of 60° C for each card and the cooler working at 106%. This was at least 24°C lesser than my 9800 GX2 cards! Execution times were at least reduced to a half on GPUs. This new configuration grants 3 Tflops in single precision and at least 1 Tflops in double precision. My present hardware configuration is the following:

So far, I have had no much time to experiment with the new hardware. I hope to say more to you in the near future. Just stay tuned!

Nuno Cardoso, & Pedro Bicudo (2010). SU(2) Lattice Gauge Theory Simulations on Fermi GPUs J.Comput.Phys.230:3998-4010,2011 arXiv: 1010.4834v2

## CUDA: Lattice QCD at your desktop

15/03/2011

As my readers know, I have built up a CUDA machine on my desktop for few bucks to have lattice QCD at my home. There are a couple of reasons to write this post and the most important of this is that Pedro Bicudo and Nuno Cardoso have got their paper published on an archival journal (see here). They produced a very good code to run on a CUDA machine to do SU(2) lattice QCD (download link) that I have got up and running on my computer. They are working on the SU(3) version that is almost ready. I hope to say about this in a very near future. Currently, I am porting MILC code for the computation of the gluon propagator on my machine from the configurations I am able to generate from Nuno and Pedro’s code. This MILC code fits quite well my needs and it is very well written. This task will take me some time and I have not too much of  it unfortunately.

Presently, Nuno and Pedro’s code runs perfectly on my machine (see my preceding post here). There was no problem in the code but I just missed a compiler option to make GPUs communicate through MPI library. Once I corrected this all runs like a charm. From a hardware standpoint, I was unable to get my machine perfectly working with three cards and the reason was just overheating. A chip of the motherboard ended below one of the video card resulting in an erratic behavior of the chipset. I have got a floppy disc seen by Windows 7 when I have none! So, I decided to work just with two cards and now the system works perfectly, is stable and Windows 7 sees always four GPUs.

Nuno sent to me an updated version of their code. I will make it run as soon as possible. Of course, I know that this porting will be as smooth as before and it will take just a few minutes of my time. I suggested to him to keep up to date their site with the latest version of the code as this is evolving with continuity.

Another important reason to write this post is that I am migrating from my old GeForce 9800 GX2 cards  to a couple of the latest GeForce 580 GTX with Fermi architecture. This will afford less than one thousand euros and I will be able to get 3 Tflops in single precision and 1 Tflops in double precision with more ram for each GPU. The ambition it to upgrade my CUDA machine to computational capabilities that, in 2007, made a breakthrough in the lattice studies of the propagators for Yang-Mills theory. The main idea is to have both the code for Yang-Mills and scalar field theories running under CUDA comparing their quantum behavior in the infrared limit, an idea pioneered by Rafael Frigori quite recently (see here). Rafael showed that my mapping theorem (see here and references therein) is true also in 2+1 dimensions through lattice computations.

The GeForce 580 GTX that I bought are from MSI  (see here). These cards are overclocked with respect to the standard product and come with a very convenient price. I should say that my hardware is already stable and I am able to produce software right now. But this upgrade will take me into the Fermi architecture opening up the possibility to get double precision on CUDA. I hope to report here in the near future about this new architecture and its advantages.

Nuno Cardoso, & Pedro Bicudo (2010). SU(2) Lattice Gauge Theory Simulations on Fermi GPUs J.Comput.Phys.230:3998-4010,2011 arXiv: 1010.4834v2

Rafael B. Frigori (2009). Screening masses in quenched (2+1)d Yang-Mills theory: universality from
dynamics? Nuclear Physics B, Volume 833, Issues 1-2, 1 July 2010, Pages 17-27 arXiv: 0912.2871v2

Marco Frasca (2010). Mapping theorem and Green functions in Yang-Mills theory PoS(FacesQCD)039, 2011 arXiv: 1011.3643v3

16/02/2011

As promised (see here) I am here to talk again about my CUDA machine. I have done the following upgrade:

• Added 4 GB of RAM and now I have 8 GB of DDR3 RAM clocked at 1333 MHz. This is the maximum allowed by my motherboard.
• Added the third 9800 GX2 graphics card. This is a XFX while the other twos that I have already installed are EVGA and Nvidia respectively. These three cards are not perfectly identical as the EVGA is overclocked by the manufacturer and, for all, the firmware could not be the same.

At the start of the upgrade process things were not so straight. Sometime BIOS complained at the boot about the position of the cards in the three PCI express 2.0 slots and the system did not start at all. But after that I have found the right combination in permuting the three cards, Windows 7 recognized all of them, latest Nvidia drivers installed as a charm and the Nvidia system monitor showed the physical situation of all the GPUs. Heat is a concern here as the video cards work at about 70 °C while the rest of the hardware is at about 50 °C. The box is always open and I intend to keep it so to reduce at a minimum the risk of overheating.

The main problem arose when I tried to run my CUDA applications from a command window. I have a simple program the just enumerate GPUs in the system and also the program for lattice computations of Pedro Bicudo and Nuno Cardoso can check the system to identify the exact set of resources to perform its work at best. Both the applications, that I recompiled on the upgraded platform, just saw a single GPU. It was impossible, at first, to get a meaningful behavior from the system. I thought that this could have been a hardware problem and contacted the XFX support for my motherboard. I bought my motherboard by second hand but I was able to register the product thanks to the seller that already did so. People at XFX were very helpful and fast in giving me an answer. The technician said to me essentially that the system should have to work and so he gave me some advices to identify possible problems. I would like to remember that a 9800 GX2 contains two graphics cards and so I have six GPUs to work with. I checked all the system again until I get the nice configuration above with Windows 7 seeing all the cards. Just a point remained unanswered: Why my CUDA applications did not see the right number of GPUs. This has been an old problem for Nvidia and was overcome with a driver revision long before I tried for myself. Currently, my driver is 266.58, the latest one. The solution come out unexpectedly. It has been enough to change a setting in the Performance menu of the Nvidia monitor for the use of multi-GPU and I have got back 5 GPUs instead of just 1. This is not six but I fear that I cannot do better. The applications now work fine. I recompiled them all and I have run successfully the lattice computation till a $76^4$ lattice in single precision! With these numbers I am already able to perform professional work in lattice computations at home.

Then I spent a few time to set the development environment through the debugger Parallel Nsight and Visual Studio 2008 for 64 bit applications. So far, I was able to generate the executable of the lattice simulation under VS 2008. My aim is to debug it to understand why some values become zero in the output and they should not. Also I would like to understand why the new version of the lattice simulation that Nuno sent to me does not seem to work properly on my platform. I have taken some time trying to configure Parallel Nsight for my machine. You will need at least two graphics cards to get it run and you have to activate PhysX on the Performance monitor of Nvidia on the card that will not run your application. This was a simple enough task as the online manual of the debugger is well written. Also, enclosed examples are absolutely useful. My next week-end will be spent to fine tuning all the matter and starting doing some work with the lattice simulation.

As far as I will go further with this activity I will inform you on my blog. If you want to initiate such an enterprise by yourself, feel free to get in touch with me to overcome difficulties and hurdles you will encounter. Surely, things proved to be not so much complicated as they appeared at the start.

## CUDA: An update

04/02/2011

My activity with CUDA technology by Nvidia and parallel computing is going on (see here).  I was able to get up and running the code made available by Pedro Bicudo and Nuno Cardoso (see here) on my machine. This is a code for SU(2) QCD and, currently, these colleagues are working on the SU(3) version. The code has been written directly for a machine supporting GPU computing with CUDA architecture.

Initially, I was able to get link configurations for lattices as large as 14^4, not very large but useful for some simple analysis. After a suggestion by Nuno, I have modified a parameter in the code (number of threads per block) from 16 to 8 and the simulation reached the impressive lattice volume of 64^4! I am only able to do computations in single precision as my graphics cards were built on 2008 when double precision was yet to come. But now I am in a position to do professional analysis of lattice simulations.

I would like to remember here the current configuration of my machine:

• CPU: Intel Core 2 duo E8500 with 3.16 GHz for core, 6 MB cache.
• 4 GB of DDR3 RAM.
• 2 graphics cards 9800 GX2 with two GPUs for each and 512 MB of DDR3 RAM for each GPU. So, I have 4 GPUs at work.
• Motherboard XFX 790i Ultra (3-way SLI).
• PSU Cooler Master Silent Pro Gold 1000 W.
• Windows 7 Ultimate 64 bit
• CUDA Toolkit 3.2
• Visual Studio 2008 SP1
• Parallel Nsight (Nvidia debugger for CUDA)

This configuration performs at 2Tflops in single precision and I have reached the performance declared above for lattice QCD. The output file for a single run was about 4 GB. The simulation needs some debugging after porting as some values in the output file are zeros and they should not. Plaquette  values are good instead. Nuno produced new code from the old one but I was not able to get it running properly even if it compiled correctly.

During the week-end I am planning to further upgrade the machine. I will install another card 9800 GX2 (this one is a XFX while the others are EVGA and Nvidia respectively but are identical as the only producer is Nvidia) and 4 GB of RAM reaching the maximum value of 8 GB of RAM for my motherboard. The aim of this upgrade is to get an evaluation of both the gluon propagator and the spectrum at very large volumes, comparable with the works of the cornerstone date of Regensburg 2007. I would also like to get some code to solve $\lambda\phi^4$ theory to check my mapping theorem in four dimensions. I would like to emphasize that Rafael Frigori proved it correct in 2+1 dimensions (see here).

After the upgrade I will report on the blog. As I will get more time for this I will be able to produce some useful results that I hope to put here.

Frigori, R. (2010). Screening masses in quenched (2+1)d(2+1)d Yang–Mills theory: Universality from dynamics? Nuclear Physics B, 833 (1-2), 17-27 DOI: 10.1016/j.nuclphysb.2010.02.021