A physics software repository

Scientific publishing has undergone a significant revolution after Paul Ginsparg introduced arXiv. Before this great idea, people doing research used to send preprints of their works to some selected colleagues for comments. This kind of habit was costly, time consuming and reached very few people around the World until the paper eventually went through some archival journal. Ginsparg’s idea was to use the web to accomplish this task making widely known papers well before publication to all the community. This changed the way we do research as it is common practice to put a paper on arXiv before submission to journals. This has had the effect to downgrade the relevance of these journal for scientific communication. This is so true that Perelman’s papers on Poincaré conjecture never appeared on literature, they are just on arXiv, but the results were anyhow generally acknowledged by the scientific community. This represents an extraordinary achievement for arXiv and shows unequivocally the greatness of Ginsparg’s idea.

Of course, research is not just writing articles and get them published somewhere. An example is physics where a lot of research activity relies on writing computer programs. This can happen on a lot of platforms as Windows, Mac, Linux or machines performing parallel computations. Generally, these programs are relegated to some limited use to a small group of researchers and other people around the World, having similar problems, could be in need of it but are forced to reinvent the wheel. This happens again and again and often one relies on the kindness of colleagues that in some cases could have not the good will to give away the software. This situation is very similar to the one encountered before arXiv come into operation. So, my proposal is quite simple: People in the scientific community having the good will to share their software should be stimulated to do so through a repository that fits the bill. This could be easily obtained by extending arXiv itself that already contains several papers presenting software written by our colleagues that, aiming to share, just put there a link. But having a repository, it could be easier to maintain versions as already happens to paper and there would be no need to create an ad hoc site that could be lost in the course of time.

I do not know if this proposal will meet with success but it is my personal conviction that a lot of people around the World has this need and this could be easily realized by the popularity of certain links to download programs for doing computations in physics. This need is increasingly growing thanks to parallel computation made available to desktop computers that today is a reality. I look forward to hear news about this.

Advertisements

6 Responses to A physics software repository

  1. Simon says:

    Hi Marco, I’m not quite sure what you’re getting at.

    The arXiv already accepts ancillary files where you can attach source code and data files.

    And there’s a variety of software repository sites that you can store/develop software at, and ones like github and google code are not going anywhere soon. They also make it incredibly easy for someone to branch your code. (Admittedly, most of the large hosts do not allow private branches on their free accounts.)

    Finally, there’s already a few repositories of high energy particle physics software. (“FreeHEP… on the web since 1992” that’s only one year after the arXiv!)

    Is there anything that you want that is not satisfied by one or a combination of these solutions?

    • mfrasca says:

      Hi Simon,

      Nice to hear from you again. As your comment clearly shows, there is a spread between repositories that could make very difficult to identify the right place at cost of a significant effort and if one has luck enough.

      This is not the right way to work and resemble the pioneer times when preprints were going around using postal services. But now we have the web and we can realize a more systematic approach.

      So, let me suppose that a PhD student would need a code to compute the phase diagram of a 2D electron gas. This is considered a forefront research problem. For a department having not such a code one should look around in the World for groups doing similar research hoping they will release their code freely. Otherwise the question is: How much time must one reinvent the wheel to work out something several people already did a lot of times?

      So, we need a single repository with all the software made public (under a proper license agreement) properly classified and easy to search for and download. I find that for software one meets a dyscrasia with respect to the freedom scientific information should spread. My hope is that this difficulty could be promptly removed.

      Best,

      Marco

  2. Simon says:

    Nice to hear from you again.

    Thanks! I comment less than I should, but I do regularly read your blog. It’s interesting stuff and often makes me wish I had time/knowledge to work on some of it.

    Just a few comments – in a sense I’m just thinking out loud, don’t take anything personally!.

    You complain that there is a spread between existing repositories, so why should we create another one and increase the spread? You talk about not wanting to reinvent the wheel, so why should we reinvent the software repository? Personally I’m fond of github. The sympy project uses it (the project leader is a physicist). I have a friend who works on http://www.gromacs.org/ and they moved their primary repository to github.

    I’m not sure that having a single HEP/physics/science repository system will be any better. How do you envisage this central/universal repository to work? Can anyone post there? What will its scope be? Will there be a minimal central moderation like the arXiv? Can anyone who wants to play with your code branch it and keep that branch on the repository?

    Large and respected projects will have web sites and published/arXived papers linking to the repository that they use. The multitude of small projects and hobby code would probably clog up a central repository unless something like a community rating/moderation system (like is sometimes discussed for the arXiv) is implemented to help separate the wheat from the chaff.

    —-

    Actually, after writing all of this, maybe you’re right. A central repository for project pages and current releases (not necessarily or even preferably the full repository used for the code development) might be a handy thing…

    But it will be hard to establish and convince people to use it. The arXiv has the advantage of being there since the beginning. There are other preprint servers that some people prefer, but the majority of papers go to the arXiv.

    • mfrasca says:

      Hi Simon,

      I am happy that you appreciate and read regularly my blog. It means that the things I use to write are helpful for the community and it is really encouraging for my further activities.

      Of course, you are pointing out some shortcomings that should be expected from a kind of site like this but these are exactly the same that people that built up arXiv just met. So, I should expect that a moderate quantity of rubbish could enter but people using such a server should be smart enough to do the right choice for their needs.

      My aim is not limited to hep community but it should be seen as a larger archive exactly as arXiv is for preprints. So, you could find software for hep, condensed matter, astronomy and so on. For the use of this software a license agreement should be clearly stated.

      I think that this is more practical to use with respect to a link that on the long run could be broken. Besides, maintaining different versions through such an archive would be really simple and easy to trace. Search through such a repository should be easy as already is for arXiv.

      I do not know if this proposal will meet with luck. I have always found my colleagues very kind and collaborative. But I am also sure that in this way the burden on our community, also for the management of software that we largely produce, will be surely lighter.

      Best,

      Marco

    • Dear Simon and Marco,

      Thanks for your writing about an issue I have often thought about, the need for an archive of Scientific software. My research interests center very much on writing software for science, and there are very few places to publish such software. I could write a long essay on my thoughts but I for the moment I will let a few main points suffice. I shall go through the alternative possibilities:

      A. One can put the software on ones own university web site. However when the person moves jobs or retires it will probably be lost, and (b) it is not as accessible as if it was “properly classified and easy to search for and download” in some centra archive, as Marco said. There is also absolutely no quality control.

      B. One could use one of the software repositories, like sourceforge or github. The disadvantage with this solution is that we still lack the easy classification and search possibilities. There are also other problems: The software is mixed with lots of non-scientific software (including many computer games) and there is a plethora of unfinished software mixed in there. Again the quality control is lacking.

      C. There are also journals which publish software, including ACM Transactions on Mathematical Software, and Journal of Statistical Software. The journals keep published software (permanently) in their own repositories, presumably indefinitely, and now there is quality control. But not all software is suitable for publication by these journals, and another problem is the publishing and reviewing time (cf. Marcos comments on the advantages of arXiv compared with the journals). Another issue is the software licensing: both these journals enforce that the published software be licensed (JSS use the Gnu GPL license and ACM TOMS have their own similar license). One may want to use a more liberal license, allowing for instance Matlab, Nag, Intel, or others to use the software if they want.

      D. The Netlib repository is I believe the existing one that comes closest to what we may want. It has the Blas and Lapack reference implementations among many other things. But it lacks a nice search/browse interface, the categorization is primitive, the requirements/procedure for submission is not described, and I believe the majority of the contents is since the last century.

      E. There are some fairly reasonable alternatives run by companies, such as Mathworks. Their’s is called Matlab Central. They have a nice licensing rule (employing the very liberal BSD license), but a major problem is that if one writes really nice software they are likely to remove it once Mathworks themselves get round to adding the feature to Matlab. And of course this is just for Matlab programs.

      F. Then about the issues regarding the setup of the site. Some of my thoughts:
      1) It should not be limited to physics.
      2) There must go some thought into its structure, to allow both
      searching and browsing (by programming language / category)
      3) There must be quality control. There are several alternatives:
      a) some sort of (short) review process.
      b) Journal of Athmospheric physics (and others) put the
      submission on-line after a preliminary acceptance for
      criticism/review by anyone, before final acceptance.
      c) Some sort of user rating system / comments by users…

      G. There should be clearly spelled out requirements for the software: complete, portable, well documented, and well structured, supporting code must be included (or already in the archive), there should be a handbook and examples of use, references to or description of algorithms, evidence that the program is correct (e.g. a comprehensive test program), and if appropriate a makefile (or automake files).

      H. In order that scientists will be eager to develop code for the archive a section of it should count as a scientific journal with ISI status

      Enough for now,

      Kristjan Jonasson
      Dept. of Computer Science
      University of Iceland
      jonasson@hi.is

      • mfrasca says:

        Dear Kristjan,

        Thank you very much for your intervention. Your view from an expert side was much in need. I think I would not have been able to say it better.

        My hope is that someone will take the chance to realize such a repository that our community greatly needs.

        Marco

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: