Special section in Science on reproducible research

Recently there was an issue of Science (December 2, 2011, Volume: 334, Issue: 6060) with a special section focusing on data replication and reproducibility in the sciences. It is about time that the big fish put this topic on the table.

Off-the-cuff, while Peng’s perspective on Reproducible Research in Computational Science is of particular interest to me the other accompanying perspectives are quite eye-opening to someone not doing empirical research. For example Tomasello and Call’s perspective on Methodological Challenges in the Study of Primate Cognition and Ryan’s Replication in Field Biology: The Case of the Frog-Eating Bat discuss challenges facing field research that I have never really spend much time thinking about (and which, if possible, gives me even more respect for people doing field work). Or as Ryan succinctly summarizes it

“Studies conducted in the field offer unique opportunities to observe nature, but achieving true replication under natural conditions is challenging.”

In light of this it is rather ironic that papers based on empirical work (field work in particular) often contain details that may be very difficult or impossible to reproduce while papers based on computational research typically omit details that would be straight forward to reproduce (e.g. computer code, random number generator seeds, etc.).

HT: Steph.

This is from the “Mario’s Entangled Bank” blog ( http://pineda-krch.com ) of Mario Pineda-Krch, a theoretical biologist at the University of Alberta.

Posted in reproducible research, science | Leave a comment

To Sweave, or not to Sweave, that is the question

I am about to start writing up the manuscript of my recent biomath seminar (Act 3: Pineda-Krch. 2011. Cycles at the edge of existence: Emergence of quasi-cycles in strongly destabilized ecosystems.). While the slides for the talk were put together using Sweave to illustrate how the literate programming paradigm can improve reproducibility the question now is if I should use Sweave for the manuscript as well. If one is to ensure reproducibility, it is a no brainer. In the computational sciences there currently are no better alternatives to ensuring reproducibility than an “executable” manuscript. The problem is, however, that while any self-respecting scientific journals would agree that reproducibility is important few journals go beyond vague wordings on this topic in their guidelines for authors. Specifically, very few journals explicitly accept manuscript prepared using any of the literate programming systems (e.g. noweb, CWEB, Sweave, etc.).

Typically the initial manuscript submission only requires a PDF that then goes out for peer review (if your lucky starts are aligned properly). Once your manuscript is accepted, however, you inevitably need to submit the LaTeX source (and if you don’t the journal may take the less travelled and perilous road down the valley of manual typesetting). Of course, with Sweave it would be straight forward to just submit the Sweave generated LaTeX file. The potential issue here is that this is not vanilla LaTeX (but, of course, it is not rocket science either for a progressive and open-minded journal) and this could be a problem, particularly if the journals has very specific format and/or formatting requirements and are particularly obsessive-compulsive about it (plenty of journals are).  So to Sweave your manuscript (and risk the wrath of the journal), or not to Sweave (and compromise reproducibility), that is the question.

Of course, a simple solution would be to submit the manuscript as a Sweave file and then simply considering any upcoming LaTeX problems not to be your problems to deal with. Or as a colleague put it once, once your manuscript is accepted getting it into print is their problem. As I am starting to write-up the manuscript I remain undecided, but I suspect being a reproducible research/literate programming/Sweave advocate the decision may have already been made for me, now it just needs time to sink in through my thick skull.

Computational sciences has a long way to go before it reaches the level of reproducibility that is taken for granted in empirical research and in mathematics. Or as Roger Peng much more eloquently expresses in his recent Science perspective:

“The field of science will not change overnight, but simply bringing the notion of reproducibility to the forefront and making it routine will make a difference. Ultimately, developing a culture of reproducibility in which it currently does not exist will require time and sustained effort from the scientific community.”

Perhaps my manuscript could be one small contribution towards this goal.

This is from the “Mario’s Entangled Bank” blog ( http://pineda-krch.com ) of Mario Pineda-Krch, a theoretical biologist at the University of Alberta.

Posted in manuscript, R, Sweave, writing | 9 Comments

New paper: Phenotypic plasticity

The proofs are done so I assume this is now officially in press to appear in the Encyclopedia of Theoretical Ecology (eds. Hastings & Gross) by California University Press. The book is slated to appear in May 2012 and at 870 pages (give or take) it promises to make the average brick look puny (but at three orders of magnitude higher price). Keep your eyes peeled at pages 545-550.

Phenotypic plasticity
Mario Pineda-Krch

Abstract
Phenotypic plasticity is the ability of a single genotype to express different phenotypes in response to environmental conditions, e.g. alternative morphologies, physiological states, or behaviour. Such plasticity can be expressed as several highly distinct phenotypes or as a continuous norm of reaction describing the functional interrelationship of a range of environments to a range of phenotypes.

This is from the “Mario’s Entangled Bank” blog ( http://pineda-krch.com ) of Mario Pineda-Krch, a theoretical biologist at the University of Alberta.

Posted in manuscript, phenotypic plasticity | Leave a comment

Happy Birthday Origin of Species with a homage to Morse Peckham

152 years ago on the day 1250 copies of the first edition of Charles Darwin’s On the Origin of Species (OOS) went on sale in the United Kingdom. By the end of the day all copies were sold out. All in all six editions of OOS were published in short order during Darwin’s lifetime. The 2nd edition appeared 1860, 3rd in 1861, 4th in 1866, 5th in 1869, and the 6th in 1872 (with a corrected 6th appearing in 1876). Between 1859-1890 a total of 39000 copies were printed in the UK alone and the publisher, John Murray, considered it to be a “bread-and-butter book”. The first american edition (of the 1st English edition) was printed in 1860 (New York: D. Appleton) followed by the Dutch and German in the same year, French in 1862, Russian in 1864, Swedish in 1871, Danish in 1872, Polish and Hungarian in 1873, Spanish in 1877, Serbian in 1878, Japanese in 1896, and Chinese in 1902.

Första svenska översättningen av Om arternas uppkomst.

For the Darwin scholars among us that do not have £100000 (give or take a few ten-thousand pounds) lying around (approx. $CAD162000 or $US155000), or prefer to use this amount of dough towards a mortgage, the next best thing is the Harvard University Press facsimile of the first edition. At $US23 you do not only avoid have to decide between purchasing a house or a copy of the OOS but you can also freely add your scribbles and annotations without giving anyone a cardiac arrest. The first edition of the OOS provides the clearest and most untarnished view of what Darwin set out to say and how he intended to say it. In subsequent edition Darwin revised the book extensively and repeatedly largely in an attempt to address criticism by the catholic establishment (e.g. Mivart). As a result many of his original lines of argument and evidence became clouded. About 75% of the sentences in the first edition were rewritten from one to five times. Over 1500 sentences were added and 325 were removed. Of the original and added sentences there are nearly 7500 variants of all kinds. In terms of net added sentences, the sixth and final edition is nearly 30% longer than the first edition. That is a lot of revising!

One of the more remarkable revisions is the subtle change occurring in the final paragraph between the first and second edition. In the first edition this paragraph is:

There is grandeur in this view of life, with its several powers, having been originally breathed into a few forms or into one; and that, whilst this planet has gone cycling on according to the fixed law of gravity, from so simple a beginning endless forms most beautiful and most wonderful have been, and are being, evolved.

while in the second edition it says (bold font is mine)

There is grandeur in this view of life, with its several powers, having been originally breathed by the creator into a few forms or into one; and that, whilst this planet has gone cycling on according to the fixed law of gravity, from so simple a beginning endless forms most beautiful and most wonderful have been, and are being, evolved.

While I cherish my $23 facsimile of the first edition of OOS there is one other book that is (if possible) even more indispensable to anyone interested in the evolution of OOS (no pun intended). One the centennial of the publication of the first edition of OOS Morse Peckham, a historian at the University of Pennsylvania published what was, and still is, undoubtedly the most important reference on the OOS, the Origin of Species – A Variorum Text. Although Morse Peckham (1914-1993) is not exactly a household name among Darwin scholars, he should be. His variorum OOS was a hugely important contributions to the history of biology, evolution, and science and is an astonishing reference for any serious discussion of how Darwin’s ideas developed over time. Without a variorum text one cannot understand the development of Darwin’s thought over time in the context of contemporary social and scientific influences. Without a variorum text, how could one know that a statement in the first edition was not modified in subsequent editions or that a statement in a later edition was not present in earlier editions? Without a variorum text, how would one related OOS to any reference in other scientific and non-scientific work of the time?

Darwin's The Origin of Species according to Morse Peckham's The Origin of Species - A Variorum Text.

Collating a variorum text of the six editions of OOS in the mid 1950, before the advent of computers, word processors, data bases, and text processing scripting languages must have been a formidable task. Just thinking of it gives me angst. Remarkably, however, in the acknowledgement Peckham states:

My thanks are particularly due to the Committee on Research of the University of Pennsylvania for grants which aided me in the collection of materials for this edition of Darwin’s The Origin of Species; and to the Officers and Trustees of the University of Pennsylvania for a semester’s leave of absence to work on this edition.

Does this actually suggests that Peckham put together his variorum OOS in one semester? That is hard to believe. I would have guessed that it would have taken several years to manually and painstakingly compare each sentence in the OSS in all six edition and piece together a variorum text. It is somewhat of a cruel fate that 50 some years later Peckham’s feat is essentially a trivial exercise involving googling and a few lines of text processing (i.e. download the full texts of each edition of OOS from the internet, run it through a few lines of Perl, or something similar, and within seconds you’ll have your own variorum text) or alternatively, if you feel really lazy, you could just visit the variorum site at The Complete Works of Darwin Online. Knowing Darwin, however, I have a feeling he would have approved of Peckham’s painstaking efforts.

So happy birthday On the Origin of Species and long live Morse Peckham’s variorum.

Peckham; you rock! The only On the Origin of Species a Darwin aficionado will ever need.

This is from the “Mario’s Entangled Bank” blog ( http://pineda-krch.com ) of Mario Pineda-Krch, a theoretical biologist at the University of Alberta.

Posted in Charles Darwin, evolution, Origin of Species | 4 Comments

F1000 review: Open science is a research accelerator

As promised previously, today the following post-publication evaluation of Open Science is a research accelerator by Michael Woelfle, Piero Olliaro and Matthew H. Todd appeared in Faculty of 1000 Biology:

Pineda-Krch, M. Faculty of 1000 Biology, 14 November 2011 http://f1000.com/13352995

It is commonly taken for granted that difficult scientific problems need to be solved the same way cathedrals are built: hierarchically and methodologically by a closed group of experts led by one or a few masters of the trade. As a result, scientific progress, just like the erection of a splendid cathedral, is often a slow meandering process, riddled with challenging problems requiring the technical skills of a few highly trained experts. A recent commentary in Nature Chemistry by Woelfle et al. overturns much of what we thought we knew about the virtues of the cathedral-building approach for solving scientific problems and offers a remarkable and unexpected alternative to the prevailing dogma.

This commentary describes a unique and unconventional scientific methodology used in a research project in organic chemistry, aimed at designing an alternative and cheaper synthesis of a drug used in treating schistosomiasis. This research is remarkable in that it was performed in full view of the public eye, with progress, data, results, analyses and manuscript drafts being posted online as they were generated (http://www.thesynapticleap.org/ [accessed 14 Nov 2011]). One can think of it as the research notebook that most scientists would have on their lab bench for recording the day’s trials and tribulations but with the added twists that this lab notebook is online and publicly available in real-time. This approach to the scientific process has been referred to as ‘open science’ or ‘open notebook science’ and has largely been inspired by the highly successful open-source software paradigm. Woelfle et al. are not the first research group to successfully use this approach or to experience the resulting acceleration of scientific discoveries. For example, Timothy Gowers, a University of Cambridge mathematician, a Fields medallist and the founder of the Polymath Project, described on his blog the process of open science “as being to ordinary research as driving is to pushing a car” {1}. This commentary is unique in that it is the first time that open science has explicitly been identified as a research accelerator in an academic journal with broad readership.

The dichotomy of the traditional closed and the emerging open science method is analogous to the central thesis of Eric S. Raymond’s essay ‘The Cathedral and the Bazaar’ {2}. In this work, Raymond contrasts two different styles of software development, the top-down closed Cathedral model where source code is only made available as part of official software releases and where code developed between releases is restricted to an exclusive group of developers, and the bottom-up open bazaar model where the code is developed in the public view in real-time. Historically, scientific research has been using the Cathedral model, where the scientific process is a secretive affair taking place behind closed doors among an exclusive group of researchers, with scientific discoveries reported thorough established communication channels and with no data or results made public ahead of time. In contrast, in the Bazaar model, the entire primary record of a research project is made publicly available as it is recorded with virtually no barriers for anyone interested to contribute to the discussion. One of the basic ideas in the Bazaar model of software development is Linus’ Law, named after Torvald Linus, the father of the GNU/Linux kernel project, which states that “given enough eyeballs, all bugs are shallow”, in other words, given enough programmers and people testing the code, almost every problem will be found quickly and the solution will be obvious to someone (but not necessarily to the same person who identified the issue). Just like scientific problems, modern software can be exceedingly complex, and having access to enough eyeballs has turned out to be stroke of genius for software development. From the perspective of the scientific process, one can think of Linus’ Law as a form of informal peer-review where anyone, expert and novice alike, has access to the same primary research record and can contribute by joining the discussion in a great babbling bazaar. The potential benefits to the scientific process are profound, as it not only increases the pace of scientific discoveries but also makes the results more robust, improves reproducibility and disseminates the results faster and wider. The case study Woelfle et al. describes confirms this and shows that Linus’ Law also applies in the realm of scientific advancement.

While researchers from all walks of science and at all stages of their careers would benefit from perusing the ideas put forth in this commentary, the next generation of scientists, in particular, would benefit from familiarizing themselves with the concept of open science and the opportunities that may present themselves by taking the road less travelled the way Woelfle et al. have done. While the cathedral-building method may be the best approach for building a limestone cathedral, evidence is amassing, ever so slowly, that the bazaar model is a faster, more efficient and less error-prone approach for solving some of the most difficult and urgent scientific problems facing humanity. This is not to say, however, that the cathedral model is obsolete and should be abandoned. Individual vision and brilliance still has a place in science. But even in these exceptional circumstances one cannot but wonder what scientific heights may be reached if the brilliance of these exceptional minds would be amplified through the collective talent of the bazaar.

References:
{1} Gowers T, “Comment 1701” in “Gower’s Blog”, “ Feb 2009http://gowers.wordpress.com/2009/02/01/questions-of-procedure/#comment-1701 (accessed 14 Nov 2011).
{2} Raymond ES “The Cathedral and the Bazaar: Musings on Linux and Open Source by an accidental revolutionary.” North Sebastopol: O’Reilly Media, 2001 [ISBN:978-0596001087].

Woelfle, M., Olliaro, P. and Todd, M.H. 2011. Open science is a research accelerator. Nature Chemistry 3: 745-748 (doi: 10.1038/nchem.1149)

This is from the “Mario’s Entangled Bank” blog ( http://pineda-krch.com ) of Mario Pineda-Krch, a theoretical biologist at the University of Alberta.

Posted in closed science, F1000 Biology, Open Notebook science, open science, The Cathedral and the Bazaar | 1 Comment

The Joy of R: A Feline Guide

Just because it’s caturday

The face of feline pleasures. Long heat-generating Sweaving makes you puRR.

The heat sink and me - this sort of thing is my bag baby.

Putting the R into puRR.

Images by Mario Pineda-Krch (CC BY-NC-SA 3.0)

This is from the “Mario’s Entangled Bank” blog ( http://pineda-krch.com ) of Mario Pineda-Krch, a theoretical biologist at the University of Alberta.

Posted in cats, computing, humour, R, Sweave | 3 Comments

Cycles in finite populations: A reproducible seminar in three acts

For this years Halloween I presented the mathematical biology seminar at the Centre for Mathematical Biology. Here is the title and the abstract…

Cycles in finite populations: a reproducible seminar in three acts

Many natural populations exhibit cyclic fluctuations. Explaining the underlying mechanisms of such cycles is a central problem in ecology and has preoccupied population ecologists ever since Elton’s classical work in 1924. Over the years, a wide range of mathematical models have been explored in an attempt to gain understanding of the conditions giving rise to or inhibiting population cycles. Many of these models, however, rely on the assumption that population sizes are infinite, and hence implicitly assume that the effects of demographic stochasticity are negligible.

Here I will show how demographic stochasticity can give rise to regular and persistent population cycles, so-called quasi-cycles, in simple finite consumer-resource models that are deterministically stable. The existence of such quasi-cycles expand the scope of population cycles caused by ecological interactions, thereby complicating the conclusive interpretation of such patterns. I will discuss how quasi-cycles dovetail with existing theory and will also illustrate the feasibility of accurately identifying such cycles by systematically applying a series of simple analyses to simulated data and data from natural populations.

I will be using this presentation to illustrate how reproducible computational science can be practiced.

Regarding the final statement about reproducible computational science… In the name of transparency and executability (if you look at the slides you will understand why this is important) I am making all the information necessary to reproduce this work available here, i.e. slides, source for slides (Sweave code), simulation code (in C), almost all of the data, figures, etc.

The woeful state of affairs of transparency in computational sciences.

While I had (and still have) the intentions of making all the data available the sheer size of the data in this case (60000 files totalling around 18GB) is problematic in terms of sharing. At this point I am simply not able to post this amount of data online. This is clearly going to be an issues in similar future projects so I need to find a solution. The good news is that you are able to recreate the data by running the simulation code (but it takes a while). If you really want to original data I would be happy to provide it but you will probably have to mail me (mail like in envelope and stamp mail) a sizeable memory stick. Please contact me first to make arrangements. Of course, I would also be interested in hearing any possible alternative solutions to this conundrum.

A few words about reproducing this research. The slides (and hence all the research) is done using Sweave with Beamer so you will need to have R (2.13.1) and LaTeX installed. The code for the stochastic simulations is in C (based on the code here) so you’ll need to have an appropriate compiler (i.e. gcc), unless you are running OS X Lion in which case you might be able to use the included binaries.

The whole package (sans data) is available here (6MB) and if you prefer only the PDF slides (5.5 MB) you can get them here.

This is from the “Mario’s Entangled Bank” blog ( http://pineda-krch.com ) of Mario Pineda-Krch, a theoretical biologist at the University of Alberta.

Posted in LaTeX, Open Notebook science, open science, predator-prey model, presentation, programing, R, Sweave | 11 Comments

Open Access(ish) contribution: Cycles in finite populations: a reproducible seminar in three acts

It’s Open Access week and this is what the hoopla is all about

“Open Access” to information – the free, immediate, online access to the results of scholarly research, and the right to use and re-use those results as you need – has the power to transform the way research and scientific inquiry are conducted. It has direct and widespread implications for academia, medicine, science, industry, and for society as a whole.”

I suspect that “access” in this context probably refers to access to published articles (access as in PLoS). Since PLoS Biology without further ado rejected a paper of mine a few weeks ago (something which they clearly will live to regret) I have no other option that providing this measly contribution to this glorious Open Access week.

I am giving a seminar at my department on Halloween (of all days – does this mean I should wear a costume for my talk?). Here are the details,

Title: Cycles in finite populations: a reproducible seminar in three acts

Abstract: Many natural populations exhibit cyclic fluctuations. Explaining the underlying mechanisms of such cycles is a central problem in ecology and has preoccupied population ecologists ever since Elton’s classical work in 1924. Over the years, a wide range of mathematical models have been explored in an attempt to gain understanding of the conditions giving rise to or inhibiting population cycles. Many of these models, however, rely on the assumption that population sizes are infinite, and hence implicitly assume that the effects of demographic stochasticity are negligible. Here I will show how demographic stochasticity can give rise to regular and persistent population cycles, so-called quasi-cycles, in simple finite consumer-resource models that are deterministically stable. The existence of such quasi-cycles expand the scope of population cycles caused by ecological interactions, thereby complicating the conclusive interpretation of such patterns. I will discuss how quasi-cycles dovetail with existing theory and will also illustrate the feasibility of accurately identifying such cycles by systematically applying a series of simple analyses to simulated data and data from natural populations. I will be using this presentation to illustrate how reproducible computational science can be practiced.

How is this related to Open Access? Well, the plan is that the slides will be made using Sweave (aka literate programming) and that they will be made available online (i.e. when they are done, which is probably going to be around 3am August 30). This will essentially be a taste of open notebook science and reproducible research. Weaving the slides will run the simulations (which you might not want to do since they take around 24hrs to run), analyze the data, generate the figures, mash it up in a \LaTeX file and spit out the PDF slides. Well, that is the plan at least. Once this is done, I’ll just write it up and submit it to PLoS Biology.

This is from the “Mario’s Entangled Bank” blog ( http://pineda-krch.com ) of Mario Pineda-Krch, a theoretical biologist at the University of Alberta.

Posted in Open Notebook science, open science, predator-prey model, Sweave, writing | Leave a comment

Vanilla C code for the Stochastic Simulation Algorithm

The Gillespie stochastic simulation algorithm (SSA) is the gold standard for simulating state-based stochastic models. If you are a R buff, a SSA novice and want to get quickly up and running stochastic models (in particular ecological models) that are not overly computationally demanding you might want to consider the GillespieSSA package. The chief advantage of GillespieSSA is that you will be up and running in a whiff. The main disadvantage is that you will be running slow.

If you want more horsepower (as in faster simulations) you might want to look at StochKit (Stochastic Simulation Kit) which is an extensible stochastic simulation framework developed in C++ that aims to make stochastic simulation accessible to practicing biologists and chemists, while remaining open to extension via new stochastic and multiscale algorithms. While StochKit itself is in C++ the model definitions are in XML you can therefore remain blissfuly ignorant of  C++…, but, you do need to have a C++ compiler installed (bye-bye Windoze). I have used StochKit extensively in the past and the XML model definitions (that were introduced in recent versions) really simplify and speed up setting up models. Unfortunately it has also made StochKit rely on having other C++ libraries installed (e.g. libxml2) thus complicating the setup and installation. While I had it all set up and humming along in Snow Leopard it turns out that the recent Lion upgrade utterly butchered my StochKit set up, among other things (Why Steve, why?). Anyway, to make a long story short. After spending half a day trying to reinstall StochKit and making all its dependancies play nice the cat collapsed on top of the StochKit manual in pure exhaustion.

20111023-085423.jpg

StochKit OD

Because the cat is wise beyond her years I took this as an indication that it was time to break-up my affair with StochKit and move on to greener pastures. So if GillespieSSA and StochKit are out of the game, what options remain? Well, there is a whole slew of other software packages and libraries implementing various versions of the SSA. How do you decide? In moments like this seeking advance from someone infinitely wise seems to be a good idea. Since the cat was still out I turned to my next best source of wisdom – Albert Einstein.

The simplest possible SSA implementation I am able to envision is a vanilla C code of Gillespie’s original Direct method, no frills, no dependencies, no whistles and bells. The next step was obviously to ask Google to find it for me…, but alas yet another snag. There simply does not seem to exist a vanilla C code of the SSA (but I am probably wrong). So the next best thing is of course to whip it up on your own. So to fix this glaring omission of the World Wide Web I am hereby posting the very first SSA in vanilla C. A few caveats about the code. Vanilla C means that there are absolutely no whistles and bells, i.e. it does not require any other libraries (e.g. GSL), probably does not use a fancy Mersenne Twister and it does no error checking, but it should compile right out of the box using

gcc -o ssa_lvpp ssa_lvpp.c

but something like

gcc -O3 -funroll-loops -ftree-vectorize -fast -floop-optimize -fstrength-reduce -o ssa_lvpp ssa_lvpp.c

could give marginally better performance (I have not checked). The only other requirement I needed was that the code had to be as simple, brief and as legible as possible and, oh, fast! Just for fun, here is the corresponding model definition using GillespieSSA

parms <- c(b=2, d=1, g=2.5, K=2000, c=5, a=5e-04 )

x0   <- c(N=1000, P=1000)
nu   <- matrix(c(+1, -1, 0,  0,
                  0,  0, 1, -1),nrow=2,byrow=T)
a    <- c("b*N", "d*N+(b-d)/K*N*N + a*P*N","c*a*P*N", "g*P")
tmax <- 100

and to run it do

out <- ssa(x0,a,nu,parms,tmax)

While this is somewhat shorter and easier on the eyes than the C code (albeit vanilla) the GillespieSSA implementation bites the dust in terms of computational performance. The above GillespieSSA version was still at it after 10 minutes, at which point my patience ran out and I terminated it. The same model (and same parameters) using my the C code finished in less than 1 second. That is (at least) 3 (going at 4) orders of magnitude faster people! Now, in all honesty, this large discrepancy in the computational performance is not entirely surprising. Not only will any R code be inevitable slower than the same code in C but GillespieSSA was never made to be fast (yes, you read right), it was made to be simple to use (which it is). A smart alec may perhaps remark that performance and simplicity of use are not mutually exclusive, to which I respond, “Such is life”. In contrast, my vanilla code was explicitly made with speed in mind. So there.

By midnight (I started late) the cat was still passed out on top of the StocKit installation instructions and I had finished the first version of my vanilla C code aka ssa_lvpp.c. lvpp stands for Lotka Volterra Predator-Prey model and is the stochastic implementation of the classical Lotka Volterra predator-prey model where the prey has logistic growth and the predator has a linear functional response,
\frac{\displaystyle dN}{\displaystyle dt} = rN\left(1-\frac{\displaystyle N}{\displaystyle K}\right) - cNP
\frac{\displaystyle dP}{\displaystyle dt} = c*a*N*P-g*P
where the interpretation of the various parameters is left as an exercise to the reader. Without further ado, the code is

// gcc -o ssa_lvpp ssa_lvpp.c
// gcc -O3 -funroll-loops -ftree-vectorize -fast -floop-optimize -fstrength-reduce -o ssa_lvpp ssa_lvpp.c

#include <stdlib.h>
#include <stdio.h>
#include <math.h>

// Define the largest number of time points that can be recorded
#define MAX_DATA 100000

int main(int argc, char **argv)
{
	// Extract parameters from the command line
	double    b = strtod(argv[1], (char **) NULL); // Prey birth rate
	double    d = strtod(argv[2], (char **) NULL); // Prey density independent mortality rate
	double    K = strtod(argv[3], (char **) NULL); // Carrying capacity of the prey
	double    a = strtod(argv[4], (char **) NULL); // Predator attach rate
	double    c = strtod(argv[5], (char **) NULL); // Predator conversion efficiency
	double    g = strtod(argv[6], (char **) NULL); // Predator mortality rate
	double    N = strtod(argv[7], (char **) NULL); // Prey population size
	double    P = strtod(argv[8], (char **) NULL); // Predator population size
	double tmax = strtod(argv[9], (char **) NULL); // Length of simulation
	double    r = b-d;                             // Prey intrinsic growth rate

    // Check that tmax is not larger than the array used for storing results.
    // Here we assume that data is recorded each integer time step, i.e.
    // tmax=MAX_DATA is largest tmax we can use.
	//
    // Note: It would be more elegant to use a C++'s STL vector containers for
    // the data structure.
	if (tmax>(MAX_DATA-1)) {printf("tmax=%d\n",MAX_DATA-1); exit(-1);}

	// Print the parameters and their values
	printf("b=%lf, d=%lf, K=%lf, a=%lf, c=%lf, g=%lf, N=%lf, P=%lf, tmax=%lf\n",
	        b, d, K, a, c, g, N, P, tmax);

	// Calculate some constants to speed thing up
	double constant1 = r/K;
	double constant2 = c*a;

	double t = 0;  // Current time
	double tn = 0; // Next time point at which the state of the system will be recorded
	int row = 0;   // Row number in data array where next data will be recorded
	double data[MAX_DATA][3]; // Data array for recording the state of the system

	// Record the initial sate of the system
	data[row][0] = t;
	data[row][1] = N;
	data[row][2] = P;
	row++;

	// Seed the random number generator
	srand((unsigned)time(NULL));

	// Start the simulation
	do{
		// If it is time, record the state of the system
		if (t>=tn) {
			data[row][0] = t;
			data[row][1] = N;
			data[row][2] = P;
			row++;
			tn++;
		}

		// Determine the total rate of each event
		double prey_birth = b*N;
		double prey_death = d*N+constant1*N*N + a*P*N;
		double pred_birth = constant2*P*N;
		double pred_death = g*P;

		// Determine the total event rate
		double toto = prey_birth + prey_death + pred_birth + pred_death;

		// Determine the time step at which the next event occurs + update the time
		double r = (double)rand() / (double)RAND_MAX;
		t += -1/toto * log10(r);

		// Determine the next event to occur + update the populations
		r = (double)rand() / (double)RAND_MAX;
		double p = r * toto;
		double sum = prey_birth;
		if      (p <= prey_birth) { N++; sum += prey_death;}
		else if (p <= prey_birth+prey_death) { N--; sum += pred_birth;}
		else if (p <= prey_birth+prey_death+pred_birth) P++;
		else P--;

		// Terminate simulation if it has reached tmax or if all populations are extinct
	} while (t<=tmax && (N>0 && P>0));

	// Record the final state of the system
	data[row][0] = t;
	data[row][1] = N;
	data[row][2] = P;
	row++;

	// Send the recorded data to STDOUT
	printf("time, N, P\n");
	int i;
	for(i==0; i
}

To run this code with the same parameters as previously just do (remember there is no error checking so those arguments better be spot on)

./ssa_lvpp 2 1 2500 5e-04 5 2.5 1000 1000 100 1

Finally, if anyone by chance finds the code useful, feel free to use and do whatever you deem fit with it (e.g. modifying it for other models is straightforward). No license, no thanks, no attribution necessary and no blame will be accepted (but I would be happy to get comments). Since I wrote the code I have used it to run in excess of 60000 simulations for a research project (it took 24 hrs on a dinghy lap top, i.e. approximately 1.5 seconds per run if you need to know, now multiply that with 1000 to get an estimate of how long GillespieSSA would be at it).

Disclaimer: the only reason I have no inhibitions trashing GillespieSSA is because I happen to be the author of it. Having said this, for the right models and circumstances it is actually a useful package, e.g. to experiment with SSA and wrap your head around how the SSA works.

This is from the “Mario’s Entangled Bank” blog ( http://pineda-krch.com ) of Mario Pineda-Krch, a theoretical biologist at the University of Alberta.

Posted in computer simulations, computing, cookbook, Daniel Gillespie, Gillespie algorithm, GillespieSSA, R | 1 Comment

Peer reviewed quote of the day

“Just as the corner grocery store lacks variety as compared with a supermarket, so very small populations lack genetic variety (and therefore evolutionary potential) as compared with larger ones.” - Alfred G. Fischer

Fischer, A.G. 1960. Latitudinal variation in organic diversity. Evolution 14: 64-81.

This is from the “Mario’s Entangled Bank” blog ( http://pineda-krch.com ) of Mario Pineda-Krch, a theoretical biologist at the University of Alberta.

Posted in manuscript, peer review | Leave a comment

10 years of seminars in mathematical biology as a cloud

Ten years of seminars in mathematical biology later.


The cloud was formed from the titles of talks given at the Centre for Mathematical Biology between 2001-2011.

This is from the “Mario’s Entangled Bank” blog ( http://pineda-krch.com ) of Mario Pineda-Krch, a theoretical biologist at the University of Alberta.

Posted in CMB, Mathematical biology | Leave a comment

From the lowliest of worms

For the Darwin Song Project Mark Erelli composed and performed a piece entitled Kingdom Come. The song is haunting and if you pay close attention to the lyrics you may realize (or not) that it is as if Darwin himself was speaking (or singing if he was into that) using Mark Erelli’s voice. I don’t mean this metaphorically, but quite literally.

Let me walk you through the lyrics, verse by verse.

The wasp she lays an egg
‘Neath a caterpillar’s skin
It hatches and the larva grows
Feasting from within
It kills the host then off it goes
To sting another one
Seems to me there’s too much misery to believe in Kingdom Come

Kingdom Come
Kingdom Come
Seems to me there’s too much misery to believe in Kingdom Come

This verse talks about the ichneumon wasps where the female finds a host and lays an egg on, near, or inside the host’s body. Upon hatching, the larval ichneumon feeds either externally or internally, killing the host when they themselves are ready to pupate. In a 1960 letter to the American botanist Asa Grey (6 months after the publication of the first edition of the Origins of Species), Darwin expresses his concerns reconciling the habits of this species with the notion of a world created by a benevolent God.

With respect to the theological view of the question; this is always painful to me.— I am bewildered.— I had no intention to write atheistically. But I own that I cannot see, as plainly as others do, & as I shd wish to do, evidence of design & beneficence on all sides of us. There seems to me too much misery in the world. I cannot persuade myself that a beneficent & omnipotent God would have designedly created the Ichneumonidæ with the express intention of their feeding within the living bodies of caterpillars, or that a cat should play with mice. Not believing this, I see no necessity in the belief that the eye was expressly designed. On the other hand I cannot anyhow be contented to view this wonderful universe & especially the nature of man, & to conclude that everything is the result of brute force. I am inclined to look at everything as resulting from designed laws, with the details, whether good or bad, left to the working out of what we may call chance. Not that this notion at all satisfies me. I feel most deeply that the whole subject is too profound for the human intellect. A dog might as well speculate on the mind of Newton.— Let each man hope & believe what he can.—

Lot’s of quotable stuff here, including Mark Erelli’s refrain.

I’ve a secret guaranteed
To make the theologians squirm
All it takes is time to make a man
From the lowliest of worms
We are not the pinnacle
Nobody’s chosen ones
Can the minds of men be trusted with such things as Kingdom Come

Kingdom Come
Kingdom Come
Can the minds of men be trusted with such things as Kingdom Come

Worms were near and dear to Darwin and his The Formation of Vegetable Mould Through the Action of Worms (aka Worms) was the result of extensive experiments carried out indoor in a worm-littered room. In his writings Darwin often referred to the worms as being low in biological organization.

Once I had a daughter
She died before her time
I sponged her fevered brow
And cursed her bilious decline
I feel her like a phantom limb
Pins and needles numb
Friend I cannot comprehend what makes you cling to Kingdom Come

Kingdom Come
Kingdom Come
Friend I cannot comprehend what makes you cling to Kingdom Come
Kingdom Come
Kingdom Come
Seems to me there’s too much misery
Can the minds of men be trusted
I cannot comprehend what makes you cling to Kingdom Come

This verse refers to the death Anne Elizabeth “Annie” (1841–1851) Darwin’s second child and first daughter, who was diagnosed with “bilious fever with typhoid character”. She died at the age of 10 from tuberculosis. The death of Anne was devastating and in a memorial of his daughter Darwin wrote:

We have lost the joy of the Household, and the solace of our old age:— she must have known how we loved her; oh that she could now know how deeply, how tenderly we do still & shall ever love her dear joyous face. Blessings on her.—

Annie’s death was likely a major contributing factor contributing to Darwin severing the ties with Christianity and the church.

I’ve known communion in conviction
And the loneliness of doubt
Does the master mark the sparrow’s fall
Or just let it all play out
If only for your sake my love
I wish I could succumb
Bridge the void and join for eternity in your Kingdom Come

Kingdom Come
Kingdom Come
Bridge the void and join for eternity in your Kingdom Come
Kingdom Come
Kingdom Come
Bridge the void and join for eternity in your Kingdom Come
Kingdom Come
Kingdom Come
Seems to me there’s too much misery to believe in Kingdom Come
Kingdom Come
Kingdom Come
Can the minds of men be trusted with such things as Kingdom Come

Darwin was troubled by the implications of his theory and, as a consequence, held off publishing it for many years. In an 1844 letter to J.D. Hooker (15 years before the publication of The Origin of Species), Darwin compares the realization of the immutability of species as to confessing a murder,

At last gleams of light have come, & I am almost convinced (quite contrary to opinion I started with) that species are not (it is like confessing a murder) immutable. Heaven forfend me from Lamarck nonsense of a “tendency to progression” “adaptations from the slow willing of animals” &c,—but the conclusions I am led to are not widely different from his—though the means of change are wholly so— I think I have found out (here’s presumption!) the simple way by which species become exquisitely adapted to various ends.

And to round off, here is Mark Erelli’s performance at the Shrewsbury Folk Festival where he performed this song as part of the Darwin Song Project.

This is from the “Mario’s Entangled Bank” blog ( http://pineda-krch.com ) of Mario Pineda-Krch, a theoretical biologist at the University of Alberta.

Posted in Charles Darwin, evolution, fatherhood, Music, natural history | Leave a comment

Darwinian medicine according to a 6 year old

It’s late afternoon and dad is lying sprawled out on the bed with the lights off having a massive migraine. 6 year old enters the dark bedroom with a book under his arm.

6yr old: (holding up the book) Dad, maybe this book will make you feel better.

Dad squints at the book barely making out the title, “On the Origin of Species”.

Darwinian medicine at its best. And, yes, it made dad feel better.

This is from the “Mario’s Entangled Bank” blog ( http://pineda-krch.com ) of Mario Pineda-Krch, a theoretical biologist at the University of Alberta.

Posted in children, fatherhood, Origin of Species | Leave a comment

Steve and I

It’s a sad day as guru Steve leaves us. Unbeknownst to Steve, however, Steve and I did designed a unique piece of art together. While he provided the blank slate I made further aesthetic improvements turning an everyday apple into a one of a kind work of art.

This is from the “Mario’s Entangled Bank” blog ( http://pineda-krch.com ) of Mario Pineda-Krch, a theoretical biologist at the University of Alberta.

Posted in Uncategorized | Leave a comment

(everybody shout) SHOW ME THE DATA: A presubmission inquiry in one-act

Some Open Access journals require presubmission inquiries. Most of them require you to write a sort of mini-paper of your full paper, you send it to them and then the editor gives you their (virtual) thumbs up or down for a full submission. While I appreciate that the presubmission inquiry requirement streamlines the editorial process, particularly for journals receiving boat loads of submission, it is a time-consuming process for the submission wannabe (aka author). Good thing that one can recycle and reuse these “mini-papers”.

Imagine being able to do the presubmission inquiry by phone. Such conversation could transpire as follows.

SHOW ME THE DATA: A presubmission inquiry in one-act

Any resemblance between the characters in this work and any persons, living, dead or undead, is a miracle. This story is based on fact, though on occasion the author has taken certain, very small, liberties with the details of the events, because that is his right as an Canadian. Any similarity with fictitious events or characters is purely coincidental.

AE: Academic Editor of large high-impact Open Access journal
A: Author making a presubmission inquiry

A: Rod! How ya doing? Jerry here.
ROD the Academic Editor, begins this conversation by the coffee machine. He fixes Academic Editor-in-Chief a cup of coffee as he talks. In the background, monitoring the crisis are the Editorial Board Members
AE:“How am I doing?” I’ll tell you. I’m sweatin’, dude! That’s how I’m “doin’.” I’m sweatin’ submissions. I’m sweatin’ Eisen calling and telling me I’m sending out too many submission for review. I’m sweatin’. You hear what I’m saying?
A: I hear what you’re saying…
AE: No. I hear that you hear what I’m saying. But do you hear what I’m saying?
Jerry is looking at his watch.
AE: Alright, we’re just getting started on my list of things you need to know. Take notes if you want to.
A: (dying) Mmm. Hmm.
The Academic Editor walks down the hallway, past clippings of journal covers and citations. The Academic Editor-in-Chief follows, always listening.
AE: This is what you’re gonna do for me.
A: What can I do for you? You just tell me.
AE: It’s personal and very important. Hell! It’s a family motto. Are you ready, Jerry?
A: I’m ready.
AE: Here it is. Show me the data. (pause) Show. Me. The. Data. Say it with me, Jerry.
A: Show you the data.
AE: No, you can do better than that. Eisen’s on the other line.
A: Show you the data.
AE: No, show me the data.
A: Show me the data.
AE: Louder!
A: (loud voice) Show me the data.
AE: You’ve got to yell that shit!
A: (yelling) Show me the data!
AE: Louder, Jerry.
A: (screaming at the top of his lungs) Show me the data! Show me the data!
AE: (yelling) You love this Open Science stuff.
A: (screaming at the top of his lungs) I love the Open Science stuff! I love Open Science. I love Open Science. Show me the data!
AE: Congratulations, you may now submit your paper through our online submission system. Please note that this positive response is not a commitment to peer-review the article formally, but it is an expression of interest on the basis of the information you provide in this presubmission inquiry.

This is from the “Mario’s Entangled Bank” blog ( http://pineda-krch.com ) of Mario Pineda-Krch, a theoretical biologist at the University of Alberta.

Posted in humour, manuscript, open science, peer review | Leave a comment

The magic of reality is (as of today) real: out with the mumbo-jumbo in with reasoning and logic

Today was the day that the most highly anticipated release of the year was scheduled to take place. No it’s not the new iPhone (I could not care less), or the announcement of this years Nobel laureates in Physics (which I do care about), but rather Richard Dawkins’ new book The Magic of Reality.

Six copies arrived at our local bookstore and I am now the proud owner of one of them. In-between manuscript writing and brewing coffee I have perused this beautifully illustrated book. A few initial observations are in place,

  • There is no Bible in the index
  • It’s written god (look ma, no caps)
  • Did I say that the illustrations are amazing?
  • As far as origin myths go, my favourite is the Norse one
  • Why have one god when you can have many, there is so much more potential for juicy intrigues that way
  • Each chapter is a question
  • Each chapter starts with mythical answers to the question, followed by the real answer (aka scientific answer) (aka the truth).
  • “…mythological answers … are colourful and interesting, and real people have believed them. Some people still do.” It is the “still do” that is scary as hell. Reading about the Norse mythology is amusing and entertaining, but I am sure it was not amusing 2000 years ago if you lived in Sweden. The stories would have probably scared the living bejesus out of you and whatever the high priest (or whoever was in charge) commanded was the law or else… While nobody believes in Norse mythology anymore there are plenty of other mythologies floating around with masses of hard-core believers that will do anything the high priest (or whoever is in charge) commands them to or else… Scary stuff.
  • Teaching mythologies to children should be mandatory, teaching them that they are true should be illegal.
  • Oh, and there are also some minor inconsistency issues in some of the mythologies…

In other myths, the sun is not a god but one of the first creations of a god. In the creation myth of the Hebrew tribe of the Middle Eastern desert, the tribal god YHWH created light on the first of his six days of creation – but then, weirdly, he didn’t create the sun until the fourth day! ‘And God made two great lights: the greater light to rule the day, and the lesser light to rule the night: he made the stars also.’ Where the light came from on the first day, before the sun existed, we are not told.

It is time to turn to reality, and the true nature of the sun, as borne out by scientific evidence…

Over and out.

This is from the “Mario’s Entangled Bank” blog ( http://pineda-krch.com ) of Mario Pineda-Krch, a theoretical biologist at the University of Alberta.

Posted in Richard Dawkins, science | Leave a comment

Charles Darwin according to a 6 year old

The following conversation transpired today on the way home from the grocery store.

Me: So what can you tell me about Charles Darwin?

6yr old: He was a scientist, he was an entomologist, he was a father of 10, and he changed the world.

Aww, never has a father been so proud. Obviously we are doing something right in our home schooling.

This is from the “Mario’s Entangled Bank” blog ( http://pineda-krch.com ) of Mario Pineda-Krch, a theoretical biologist at the University of Alberta.

Posted in Charles Darwin, children | Leave a comment

The Magic of Reality: Finally a reason to get the iPad

I have never really understood the purpose of the iPad. Anything I would ever want to use the iPad for its little brother the iPod Touch can do equally well + it fits in your pocket. Anything the iPod Touch cannot do, i.e. the stuff one needs a laptop for, the iPad cannot do either. So why on earth would one want to spend $500+ on a device that has the same functionality as a gadget half its size and for half the price while being no more portable than a laptop but substantially less versatile?

It appears, however, that this state of affairs might soon change. On October 4 Richard Dawkins’ new book The Magic of Reality (Illustrated by Dave McKean) is being released. From all I have seen and heard about the book it promises to be an amazing educational resource for children of all ages. One thing is for sure, I’ll be the first one lining up at my local bookstore on October 4 to secure my own copy.

It turns out that there will also be an app for the book and, as far I can tell, it appears to be specific to the iPad (bummer). So there, that’s what I want for my birthday, an iPad with the The Magic of Reality app installed.

This is from the “Mario’s Entangled Bank” blog (http://pineda-krch.com) of Mario Pineda-Krch, a theoretical biologist at the University of Alberta.

Posted in Richard Dawkins, science | 3 Comments

Turbo charged Open Science: this sort of thing is my bag, baby

A remarkable commentary appeared today in Nature Chemistry entitled Open Science is a research accelerator. In this commentary Michael Woelfle, Piero Olliaro and Matthew H. Todd describe a case study of an Open Science research project they conducted with the aim devising an alternative method for producing a compound used in treating schistosomiasis (the scientific results of this study are reported in PLoS Neglected Tropical Diseases). The key aspect of this project making it unique is that the research was conducted in full public view and in real-time on the Internet (at The Synaptic Leap). The key take home message, other than showing that Open Science can be used to successfully solve difficult scientific problems, is that the research was accelerated by being open. Experts identified themselves and collaborations came about that otherwise would not have taken place ultimately resulting in the research proceeding faster than what it otherwise would have done. These are remarkable results. While the putative benefits of Open Science are often discussed (e.g. here, here, here) this is, as far as I know, the first time this type of synergistic benefit arising in Open Science research has been reported in a scientific journal (and a high-ranking one at that).

Much much more could be said about this paper and the outcome of this Open Science experiment and I will likely provide a more in-depth discussion in due time. So with that in mind I’ll keep things short this time and wrap it up quoting Mr. Danger Powers (or at least what he might have said if he would have been a scientist),

“As long as people are doing science with many anonymous contributors while at the same time experimenting with Open Science in a consequence free environment, I’ll be sound as a pound!”

This sort of thing is my bag

This is from the “Mario’s Entangled Bank” blog (http://pineda-krch.com) of Mario Pineda-Krch, a theoretical biologist at the University of Alberta.

Posted in Open Notebook science, open science, The Cathedral and the Bazaar | 2 Comments

Evolution according to kids

This is from the “Mario’s Entangled Bank” blog (http://pineda-krch.com) of Mario Pineda-Krch, a theoretical biologist at the University of Alberta.

Posted in children, evolution | Leave a comment

Sally Otto awarded a MacArthur Fellowship

Great news in the ether today. Evolutionary geneticist Sally Otto has been awarded the MacArthur Fellowship. Awards get handed out all the time to deserving scientists but this one hits home base. Sally is a friend and a colleague and, to me, it is obvious that no one is more deserving of this accolade than her. As an awe-inspiring scientist, remarkable advisor, awesome mentor and wonderful colleague this award recognizes what anyone that knows Sally has known all this time. As another colleague succinctly expressed it, this is “official recognition for what her students have been saying all these years”.

For those not familiar with the MacArthur fellowship. The award has been nicknamed ”the genius award” and is awarded yearly to 20-40 individuals in any field of endeavour that ”show exceptional merit and promise for continued and enhanced creative work.“ (yup, that’s Sally for you)

She now shares the hall of fame with other exceptional women in evolutionary biology and genetics, e.g. Beth Shapiro (2009),  Kirsten Bomblies (2008), Nancy Moran (1997), and Barbara McClintock (1981) (who of course also was awarded the 1983 Nobel Prize in Physiology or Medicine for her discovery of transposons).

Congratulations Sally. You are an inspiration to us all.

Thanx and a tip of the Hatlo hat to Dilara

This is from the “Mario’s Entangled Bank” blog (http://pineda-krch.com) of Mario Pineda-Krch, a theoretical biologist at the University of Alberta.

Posted in evolution, Sally Otto | Leave a comment

Choosing the tools of Open Notebook Science – redux

After tinkering with and pondering about my Open Notebook Science setup it looks like this time I will have to eat my own words. Specifically my statement “a wiki does not make a good platform for my Open Notebook Science project“. Although in principle I stand by my words it turns out that in the category of unsatisfactory ONS set-ups certain wikis are simply less bad.
Continue reading

Posted in Open Notebook science | 1 Comment

Choosing the tools of Open Notebook Science

I will always remember my post Starting an Open Notebook Science project as the day that I almost started with ONS. But, complications arose, ensued, were overcome and here I am ready again to embark on my own ONS experiment. After intently following the Open Notebook Science efforts of several people over the year, specifically Rosie Redfield’s RRResearch, Carl Boettiger’s lab notebook, Jean-Claude Bradley’s UsefulChem, Garrett Lisi’s Deferential Geometry, and Jeremiah Faith’s lab notebook, it is obvious that there are many different ways of doing ONS. As I am getting ready to embark on my own ONS experiment the question of what the best approach is for me comes to mind. This is an attempt to address this question.

Continue reading

Posted in Open Notebook science | 4 Comments

One step closer to Open Notebook Science

One step closer to Open Notebook Science right here. Now just hacking the tools.

This is from the “Mario’s Entangled Bank” blog (http://pineda-krch.com) of Mario Pineda-Krch, a theoretical biologist at the University of Alberta.

Posted in Open Notebook science | Leave a comment

What is R, really?

On CRAN, the official web home of all things R it says,

R is a free software environment for statistical computing and graphics.

Well, that sounds all hunky dory. But let’s take a close look at what this statement really says.

Continue reading

Posted in computing, open science, R | 7 Comments