Repeatability of Large Computations

The role of large computations in climate science is broadly misconstrued and one my goals is to give people a better understanding of what that role is and what it isn’t.

The high performance computing world is not perfect, though. This applies not just to climate science. The providers of infrastructure have been excessively focused on computational speed, at the expense of various forms of utility. One problem is that a computation made only a decade ago on the top performing machines is in practice impossible to repeat bit-for-bit on any machines being maintained today. What’s more, since climate models in particular have a very interesting sensitivity to initial conditions, it is very difficult to determine if a recomputation is actually a realization of the same system, or whether a bug has been introduced.

I think it’s important to state that large climate models remain a significant triumph of science, even though their role in science is misunderstood. But that doesn’t mean they are above criticism.

The issues raised in the recomputation manifesto are very relevant to climate science and some of its related fields.


  1. Many thanks for the mention of the Recomputation Manifesto.

    I hadn't thought of climate science as a target but of course you are 100% right. The more openness the better and rerunning old models would be fantastic.

    There are obviously a lot of issues with large scale experiments like this... but the challenge could be fascinating.


  2. I'm not really a numerics expert, but here's my 2cents:

    For all practical purpose there's not much of a difference to physical experiments. (Plus, there's even a similar problem in mathematics: A classic example is Perelman's work on the Poincaré conjecture: His math wasn't really "recomputable". Other mathematicians had to check and work out his papers until his work was "believed".)

    Repeatability is much less important than reproducibility. And the reproducibility can be in a statistical sense, just like some physics experiments. Even recomputation can get difficult: 1) I can imagine algorithms (and even hardware) where unimportant precision can be discarded depending on workload, inserting a Lorenz butterfly. 2) There are e.g. stochastic climate models driven by stochastic differential eqs where the noise is perhaps generated by a fast quantum device, not a slower deterministic algorithm.

  3. From my experience working as a scientific programmer, it's often not really worth the effort to precisely repeat something. There are just so many factors that can make a difference, many of which have nothing whatsoever to do with the underlying question that is being asked.

    Anyone who has ever ported a program to a new hardware platform or a new operating system (or even tried to recompile from old source code without the original compiler and settings) understands that all too well.

    While the latter is the impetus to run a program on a virtual machine, it's often better to just use a different (preferably independent) method and show that the answers of the two methods are commensurate.

    This seems to have been lost on a lot (if not most or even all) of the folks in the climate "auditing"community and was driven home when rather that simply recompiling and re-running the NASA GISTEMP code (which proved to be challenging, especially for the FORTRANly challenge at Climate Audit ), folks like John Van Vliet wrote new code and effectively replicated the GISS results.

  4. I'm not so convinced by the scientific virtues of repeatability. Restricting myself to GCM code, which I know, there are no scientific gains from repeatability. The gains are entirely in terms of bug finding and fixing.

    All the valid scientific results from climate model runs are statistical; you should obtain the same answers from a new run with perturbed initial conditions. If you don't, you're asking the wrong questions.

    Bug-fixing is a virtue, but it isn't strong enough to demand bit-repro; nor is it good to not-distinguish science from bug-fixing.

    Compiler writers generally provide a range of options to optimise code, and in the GCM case (at least with HadCM3, which I knew) this translated into "fast but non-reproducible" to "slower but reproducible" (reproducible, in this context, meant bit-repro independent of the number and configuration of processors you were using). Generally, the main runs were run slower-but-repro, for convenience, and because "fast" wasn't that much faster (??30%?? perhaps - don't trust that number).

  5. Pingback: Repeatability of Large Computations – Stoat

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.