Cycles in finite populations: A reproducible seminar in three acts

For this years Halloween I presented the mathematical biology seminar at the Centre for Mathematical Biology. Here is the title and the abstract…

Cycles in finite populations: a reproducible seminar in three acts

Many natural populations exhibit cyclic fluctuations. Explaining the underlying mechanisms of such cycles is a central problem in ecology and has preoccupied population ecologists ever since Elton’s classical work in 1924. Over the years, a wide range of mathematical models have been explored in an attempt to gain understanding of the conditions giving rise to or inhibiting population cycles. Many of these models, however, rely on the assumption that population sizes are infinite, and hence implicitly assume that the effects of demographic stochasticity are negligible.

Here I will show how demographic stochasticity can give rise to regular and persistent population cycles, so-called quasi-cycles, in simple finite consumer-resource models that are deterministically stable. The existence of such quasi-cycles expand the scope of population cycles caused by ecological interactions, thereby complicating the conclusive interpretation of such patterns. I will discuss how quasi-cycles dovetail with existing theory and will also illustrate the feasibility of accurately identifying such cycles by systematically applying a series of simple analyses to simulated data and data from natural populations.

I will be using this presentation to illustrate how reproducible computational science can be practiced.

Regarding the final statement about reproducible computational science… In the name of transparency and executability (if you look at the slides you will understand why this is important) I am making all the information necessary to reproduce this work available here, i.e. slides, source for slides (Sweave code), simulation code (in C), almost all of the data, figures, etc.

The woeful state of affairs of transparency in computational sciences.

While I had (and still have) the intentions of making all the data available the sheer size of the data in this case (60000 files totalling around 18GB) is problematic in terms of sharing. At this point I am simply not able to post this amount of data online. This is clearly going to be an issues in similar future projects so I need to find a solution. The good news is that you are able to recreate the data by running the simulation code (but it takes a while). If you really want to original data I would be happy to provide it but you will probably have to mail me (mail like in envelope and stamp mail) a sizeable memory stick. Please contact me first to make arrangements. Of course, I would also be interested in hearing any possible alternative solutions to this conundrum.

A few words about reproducing this research. The slides (and hence all the research) is done using Sweave with Beamer so you will need to have R (2.13.1) and LaTeX installed. The code for the stochastic simulations is in C (based on the code here) so you’ll need to have an appropriate compiler (i.e. gcc), unless you are running OS X Lion in which case you might be able to use the included binaries.

The whole package (sans data) is available here (6MB) and if you prefer only the PDF slides (5.5 MB) you can get them here.

This is from the “Mario’s Entangled Bank” blog ( http://pineda-krch.com ) of Mario Pineda-Krch, a theoretical biologist at the University of Alberta.

Posted in LaTeX, Open Notebook science, open science, predator-prey model, presentation, programing, R, Sweave | 11 Comments

Open Access(ish) contribution: Cycles in finite populations: a reproducible seminar in three acts

It’s Open Access week and this is what the hoopla is all about

“Open Access” to information – the free, immediate, online access to the results of scholarly research, and the right to use and re-use those results as you need – has the power to transform the way research and scientific inquiry are conducted. It has direct and widespread implications for academia, medicine, science, industry, and for society as a whole.”

I suspect that “access” in this context probably refers to access to published articles (access as in PLoS). Since PLoS Biology without further ado rejected a paper of mine a few weeks ago (something which they clearly will live to regret) I have no other option that providing this measly contribution to this glorious Open Access week.

I am giving a seminar at my department on Halloween (of all days – does this mean I should wear a costume for my talk?). Here are the details,

Title: Cycles in finite populations: a reproducible seminar in three acts

Abstract: Many natural populations exhibit cyclic fluctuations. Explaining the underlying mechanisms of such cycles is a central problem in ecology and has preoccupied population ecologists ever since Elton’s classical work in 1924. Over the years, a wide range of mathematical models have been explored in an attempt to gain understanding of the conditions giving rise to or inhibiting population cycles. Many of these models, however, rely on the assumption that population sizes are infinite, and hence implicitly assume that the effects of demographic stochasticity are negligible. Here I will show how demographic stochasticity can give rise to regular and persistent population cycles, so-called quasi-cycles, in simple finite consumer-resource models that are deterministically stable. The existence of such quasi-cycles expand the scope of population cycles caused by ecological interactions, thereby complicating the conclusive interpretation of such patterns. I will discuss how quasi-cycles dovetail with existing theory and will also illustrate the feasibility of accurately identifying such cycles by systematically applying a series of simple analyses to simulated data and data from natural populations. I will be using this presentation to illustrate how reproducible computational science can be practiced.

How is this related to Open Access? Well, the plan is that the slides will be made using Sweave (aka literate programming) and that they will be made available online (i.e. when they are done, which is probably going to be around 3am August 30). This will essentially be a taste of open notebook science and reproducible research. Weaving the slides will run the simulations (which you might not want to do since they take around 24hrs to run), analyze the data, generate the figures, mash it up in a \LaTeX file and spit out the PDF slides. Well, that is the plan at least. Once this is done, I’ll just write it up and submit it to PLoS Biology.

This is from the “Mario’s Entangled Bank” blog ( http://pineda-krch.com ) of Mario Pineda-Krch, a theoretical biologist at the University of Alberta.

Posted in Open Notebook science, open science, predator-prey model, Sweave, writing | Leave a comment

Vanilla C code for the Stochastic Simulation Algorithm

The Gillespie stochastic simulation algorithm (SSA) is the gold standard for simulating state-based stochastic models. If you are a R buff, a SSA novice and want to get quickly up and running stochastic models (in particular ecological models) that are not overly computationally demanding you might want to consider the GillespieSSA package. The chief advantage of GillespieSSA is that you will be up and running in a whiff. The main disadvantage is that you will be running slow.

If you want more horsepower (as in faster simulations) you might want to look at StochKit (Stochastic Simulation Kit) which is an extensible stochastic simulation framework developed in C++ that aims to make stochastic simulation accessible to practicing biologists and chemists, while remaining open to extension via new stochastic and multiscale algorithms. While StochKit itself is in C++ the model definitions are in XML you can therefore remain blissfuly ignorant of  C++…, but, you do need to have a C++ compiler installed (bye-bye Windoze). I have used StochKit extensively in the past and the XML model definitions (that were introduced in recent versions) really simplify and speed up setting up models. Unfortunately it has also made StochKit rely on having other C++ libraries installed (e.g. libxml2) thus complicating the setup and installation. While I had it all set up and humming along in Snow Leopard it turns out that the recent Lion upgrade utterly butchered my StochKit set up, among other things (Why Steve, why?). Anyway, to make a long story short. After spending half a day trying to reinstall StochKit and making all its dependancies play nice the cat collapsed on top of the StochKit manual in pure exhaustion.

20111023-085423.jpg

StochKit OD

Because the cat is wise beyond her years I took this as an indication that it was time to break-up my affair with StochKit and move on to greener pastures. So if GillespieSSA and StochKit are out of the game, what options remain? Well, there is a whole slew of other software packages and libraries implementing various versions of the SSA. How do you decide? In moments like this seeking advance from someone infinitely wise seems to be a good idea. Since the cat was still out I turned to my next best source of wisdom – Albert Einstein.

The simplest possible SSA implementation I am able to envision is a vanilla C code of Gillespie’s original Direct method, no frills, no dependencies, no whistles and bells. The next step was obviously to ask Google to find it for me…, but alas yet another snag. There simply does not seem to exist a vanilla C code of the SSA (but I am probably wrong). So the next best thing is of course to whip it up on your own. So to fix this glaring omission of the World Wide Web I am hereby posting the very first SSA in vanilla C. A few caveats about the code. Vanilla C means that there are absolutely no whistles and bells, i.e. it does not require any other libraries (e.g. GSL), probably does not use a fancy Mersenne Twister and it does no error checking, but it should compile right out of the box using

gcc -o ssa_lvpp ssa_lvpp.c

but something like

gcc -O3 -funroll-loops -ftree-vectorize -fast -floop-optimize -fstrength-reduce -o ssa_lvpp ssa_lvpp.c

could give marginally better performance (I have not checked). The only other requirement I needed was that the code had to be as simple, brief and as legible as possible and, oh, fast! Just for fun, here is the corresponding model definition using GillespieSSA

parms <- c(b=2, d=1, g=2.5, K=2000, c=5, a=5e-04 )

x0   <- c(N=1000, P=1000)
nu   <- matrix(c(+1, -1, 0,  0,
                  0,  0, 1, -1),nrow=2,byrow=T)
a    <- c("b*N", "d*N+(b-d)/K*N*N + a*P*N","c*a*P*N", "g*P")
tmax <- 100

and to run it do

out <- ssa(x0,a,nu,parms,tmax)

While this is somewhat shorter and easier on the eyes than the C code (albeit vanilla) the GillespieSSA implementation bites the dust in terms of computational performance. The above GillespieSSA version was still at it after 10 minutes, at which point my patience ran out and I terminated it. The same model (and same parameters) using my the C code finished in less than 1 second. That is (at least) 3 (going at 4) orders of magnitude faster people! Now, in all honesty, this large discrepancy in the computational performance is not entirely surprising. Not only will any R code be inevitable slower than the same code in C but GillespieSSA was never made to be fast (yes, you read right), it was made to be simple to use (which it is). A smart alec may perhaps remark that performance and simplicity of use are not mutually exclusive, to which I respond, “Such is life”. In contrast, my vanilla code was explicitly made with speed in mind. So there.

By midnight (I started late) the cat was still passed out on top of the StocKit installation instructions and I had finished the first version of my vanilla C code aka ssa_lvpp.c. lvpp stands for Lotka Volterra Predator-Prey model and is the stochastic implementation of the classical Lotka Volterra predator-prey model where the prey has logistic growth and the predator has a linear functional response,
\frac{\displaystyle dN}{\displaystyle dt} = rN\left(1-\frac{\displaystyle N}{\displaystyle K}\right) - cNP
\frac{\displaystyle dP}{\displaystyle dt} = c*a*N*P-g*P
where the interpretation of the various parameters is left as an exercise to the reader. Without further ado, the code is

// gcc -o ssa_lvpp ssa_lvpp.c
// gcc -O3 -funroll-loops -ftree-vectorize -fast -floop-optimize -fstrength-reduce -o ssa_lvpp ssa_lvpp.c

#include <stdlib.h>
#include <stdio.h>
#include <math.h>

// Define the largest number of time points that can be recorded
#define MAX_DATA 100000

int main(int argc, char **argv)
{
	// Extract parameters from the command line
	double    b = strtod(argv[1], (char **) NULL); // Prey birth rate
	double    d = strtod(argv[2], (char **) NULL); // Prey density independent mortality rate
	double    K = strtod(argv[3], (char **) NULL); // Carrying capacity of the prey
	double    a = strtod(argv[4], (char **) NULL); // Predator attach rate
	double    c = strtod(argv[5], (char **) NULL); // Predator conversion efficiency
	double    g = strtod(argv[6], (char **) NULL); // Predator mortality rate
	double    N = strtod(argv[7], (char **) NULL); // Prey population size
	double    P = strtod(argv[8], (char **) NULL); // Predator population size
	double tmax = strtod(argv[9], (char **) NULL); // Length of simulation
	double    r = b-d;                             // Prey intrinsic growth rate

    // Check that tmax is not larger than the array used for storing results.
    // Here we assume that data is recorded each integer time step, i.e.
    // tmax=MAX_DATA is largest tmax we can use.
	//
    // Note: It would be more elegant to use a C++'s STL vector containers for
    // the data structure.
	if (tmax>(MAX_DATA-1)) {printf("tmax=%d\n",MAX_DATA-1); exit(-1);}

	// Print the parameters and their values
	printf("b=%lf, d=%lf, K=%lf, a=%lf, c=%lf, g=%lf, N=%lf, P=%lf, tmax=%lf\n",
	        b, d, K, a, c, g, N, P, tmax);

	// Calculate some constants to speed thing up
	double constant1 = r/K;
	double constant2 = c*a;

	double t = 0;  // Current time
	double tn = 0; // Next time point at which the state of the system will be recorded
	int row = 0;   // Row number in data array where next data will be recorded
	double data[MAX_DATA][3]; // Data array for recording the state of the system

	// Record the initial sate of the system
	data[row][0] = t;
	data[row][1] = N;
	data[row][2] = P;
	row++;

	// Seed the random number generator
	srand((unsigned)time(NULL));

	// Start the simulation
	do{
		// If it is time, record the state of the system
		if (t>=tn) {
			data[row][0] = t;
			data[row][1] = N;
			data[row][2] = P;
			row++;
			tn++;
		}

		// Determine the total rate of each event
		double prey_birth = b*N;
		double prey_death = d*N+constant1*N*N + a*P*N;
		double pred_birth = constant2*P*N;
		double pred_death = g*P;

		// Determine the total event rate
		double toto = prey_birth + prey_death + pred_birth + pred_death;

		// Determine the time step at which the next event occurs + update the time
		double r = (double)rand() / (double)RAND_MAX;
		t += -1/toto * log10(r);

		// Determine the next event to occur + update the populations
		r = (double)rand() / (double)RAND_MAX;
		double p = r * toto;
		double sum = prey_birth;
		if      (p <= prey_birth) { N++; sum += prey_death;}
		else if (p <= prey_birth+prey_death) { N--; sum += pred_birth;}
		else if (p <= prey_birth+prey_death+pred_birth) P++;
		else P--;

		// Terminate simulation if it has reached tmax or if all populations are extinct
	} while (t<=tmax && (N>0 && P>0));

	// Record the final state of the system
	data[row][0] = t;
	data[row][1] = N;
	data[row][2] = P;
	row++;

	// Send the recorded data to STDOUT
	printf("time, N, P\n");
	int i;
	for(i==0; i
}

To run this code with the same parameters as previously just do (remember there is no error checking so those arguments better be spot on)

./ssa_lvpp 2 1 2500 5e-04 5 2.5 1000 1000 100 1

Finally, if anyone by chance finds the code useful, feel free to use and do whatever you deem fit with it (e.g. modifying it for other models is straightforward). No license, no thanks, no attribution necessary and no blame will be accepted (but I would be happy to get comments). Since I wrote the code I have used it to run in excess of 60000 simulations for a research project (it took 24 hrs on a dinghy lap top, i.e. approximately 1.5 seconds per run if you need to know, now multiply that with 1000 to get an estimate of how long GillespieSSA would be at it).

Disclaimer: the only reason I have no inhibitions trashing GillespieSSA is because I happen to be the author of it. Having said this, for the right models and circumstances it is actually a useful package, e.g. to experiment with SSA and wrap your head around how the SSA works.

This is from the “Mario’s Entangled Bank” blog ( http://pineda-krch.com ) of Mario Pineda-Krch, a theoretical biologist at the University of Alberta.

Posted in computer simulations, computing, cookbook, Daniel Gillespie, Gillespie algorithm, GillespieSSA, R | 1 Comment

Peer reviewed quote of the day

“Just as the corner grocery store lacks variety as compared with a supermarket, so very small populations lack genetic variety (and therefore evolutionary potential) as compared with larger ones.” - Alfred G. Fischer

Fischer, A.G. 1960. Latitudinal variation in organic diversity. Evolution 14: 64-81.

This is from the “Mario’s Entangled Bank” blog ( http://pineda-krch.com ) of Mario Pineda-Krch, a theoretical biologist at the University of Alberta.

Posted in manuscript, peer review | Leave a comment

10 years of seminars in mathematical biology as a cloud

Ten years of seminars in mathematical biology later.


The cloud was formed from the titles of talks given at the Centre for Mathematical Biology between 2001-2011.

This is from the “Mario’s Entangled Bank” blog ( http://pineda-krch.com ) of Mario Pineda-Krch, a theoretical biologist at the University of Alberta.

Posted in CMB, Mathematical biology | Leave a comment

From the lowliest of worms

For the Darwin Song Project Mark Erelli composed and performed a piece entitled Kingdom Come. The song is haunting and if you pay close attention to the lyrics you may realize (or not) that it is as if Darwin himself was speaking (or singing if he was into that) using Mark Erelli’s voice. I don’t mean this metaphorically, but quite literally.

Let me walk you through the lyrics, verse by verse.

The wasp she lays an egg
‘Neath a caterpillar’s skin
It hatches and the larva grows
Feasting from within
It kills the host then off it goes
To sting another one
Seems to me there’s too much misery to believe in Kingdom Come

Kingdom Come
Kingdom Come
Seems to me there’s too much misery to believe in Kingdom Come

This verse talks about the ichneumon wasps where the female finds a host and lays an egg on, near, or inside the host’s body. Upon hatching, the larval ichneumon feeds either externally or internally, killing the host when they themselves are ready to pupate. In a 1960 letter to the American botanist Asa Grey (6 months after the publication of the first edition of the Origins of Species), Darwin expresses his concerns reconciling the habits of this species with the notion of a world created by a benevolent God.

With respect to the theological view of the question; this is always painful to me.— I am bewildered.— I had no intention to write atheistically. But I own that I cannot see, as plainly as others do, & as I shd wish to do, evidence of design & beneficence on all sides of us. There seems to me too much misery in the world. I cannot persuade myself that a beneficent & omnipotent God would have designedly created the Ichneumonidæ with the express intention of their feeding within the living bodies of caterpillars, or that a cat should play with mice. Not believing this, I see no necessity in the belief that the eye was expressly designed. On the other hand I cannot anyhow be contented to view this wonderful universe & especially the nature of man, & to conclude that everything is the result of brute force. I am inclined to look at everything as resulting from designed laws, with the details, whether good or bad, left to the working out of what we may call chance. Not that this notion at all satisfies me. I feel most deeply that the whole subject is too profound for the human intellect. A dog might as well speculate on the mind of Newton.— Let each man hope & believe what he can.—

Lot’s of quotable stuff here, including Mark Erelli’s refrain.

I’ve a secret guaranteed
To make the theologians squirm
All it takes is time to make a man
From the lowliest of worms
We are not the pinnacle
Nobody’s chosen ones
Can the minds of men be trusted with such things as Kingdom Come

Kingdom Come
Kingdom Come
Can the minds of men be trusted with such things as Kingdom Come

Worms were near and dear to Darwin and his The Formation of Vegetable Mould Through the Action of Worms (aka Worms) was the result of extensive experiments carried out indoor in a worm-littered room. In his writings Darwin often referred to the worms as being low in biological organization.

Once I had a daughter
She died before her time
I sponged her fevered brow
And cursed her bilious decline
I feel her like a phantom limb
Pins and needles numb
Friend I cannot comprehend what makes you cling to Kingdom Come

Kingdom Come
Kingdom Come
Friend I cannot comprehend what makes you cling to Kingdom Come
Kingdom Come
Kingdom Come
Seems to me there’s too much misery
Can the minds of men be trusted
I cannot comprehend what makes you cling to Kingdom Come

This verse refers to the death Anne Elizabeth “Annie” (1841–1851) Darwin’s second child and first daughter, who was diagnosed with “bilious fever with typhoid character”. She died at the age of 10 from tuberculosis. The death of Anne was devastating and in a memorial of his daughter Darwin wrote:

We have lost the joy of the Household, and the solace of our old age:— she must have known how we loved her; oh that she could now know how deeply, how tenderly we do still & shall ever love her dear joyous face. Blessings on her.—

Annie’s death was likely a major contributing factor contributing to Darwin severing the ties with Christianity and the church.

I’ve known communion in conviction
And the loneliness of doubt
Does the master mark the sparrow’s fall
Or just let it all play out
If only for your sake my love
I wish I could succumb
Bridge the void and join for eternity in your Kingdom Come

Kingdom Come
Kingdom Come
Bridge the void and join for eternity in your Kingdom Come
Kingdom Come
Kingdom Come
Bridge the void and join for eternity in your Kingdom Come
Kingdom Come
Kingdom Come
Seems to me there’s too much misery to believe in Kingdom Come
Kingdom Come
Kingdom Come
Can the minds of men be trusted with such things as Kingdom Come

Darwin was troubled by the implications of his theory and, as a consequence, held off publishing it for many years. In an 1844 letter to J.D. Hooker (15 years before the publication of The Origin of Species), Darwin compares the realization of the immutability of species as to confessing a murder,

At last gleams of light have come, & I am almost convinced (quite contrary to opinion I started with) that species are not (it is like confessing a murder) immutable. Heaven forfend me from Lamarck nonsense of a “tendency to progression” “adaptations from the slow willing of animals” &c,—but the conclusions I am led to are not widely different from his—though the means of change are wholly so— I think I have found out (here’s presumption!) the simple way by which species become exquisitely adapted to various ends.

And to round off, here is Mark Erelli’s performance at the Shrewsbury Folk Festival where he performed this song as part of the Darwin Song Project.

This is from the “Mario’s Entangled Bank” blog ( http://pineda-krch.com ) of Mario Pineda-Krch, a theoretical biologist at the University of Alberta.

Posted in Charles Darwin, evolution, fatherhood, Music, natural history | Leave a comment

Darwinian medicine according to a 6 year old

It’s late afternoon and dad is lying sprawled out on the bed with the lights off having a massive migraine. 6 year old enters the dark bedroom with a book under his arm.

6yr old: (holding up the book) Dad, maybe this book will make you feel better.

Dad squints at the book barely making out the title, “On the Origin of Species”.

Darwinian medicine at its best. And, yes, it made dad feel better.

This is from the “Mario’s Entangled Bank” blog ( http://pineda-krch.com ) of Mario Pineda-Krch, a theoretical biologist at the University of Alberta.

Posted in children, fatherhood, Origin of Species | Leave a comment