I am about to start writing up the manuscript of my recent biomath seminar (Act 3: Pineda-Krch. 2011. Cycles at the edge of existence: Emergence of quasi-cycles in strongly destabilized ecosystems.). While the slides for the talk were put together using Sweave to illustrate how the literate programming paradigm can improve reproducibility the question now is if I should use Sweave for the manuscript as well. If one is to ensure reproducibility, it is a no brainer. In the computational sciences there currently are no better alternatives to ensuring reproducibility than an “executable” manuscript. The problem is, however, that while any self-respecting scientific journals would agree that reproducibility is important few journals go beyond vague wordings on this topic in their guidelines for authors. Specifically, very few journals explicitly accept manuscript prepared using any of the literate programming systems (e.g. noweb, CWEB, Sweave, etc.).
Typically the initial manuscript submission only requires a PDF that then goes out for peer review (if your lucky starts are aligned properly). Once your manuscript is accepted, however, you inevitably need to submit the LaTeX source (and if you don’t the journal may take the less travelled and perilous road down the valley of manual typesetting). Of course, with Sweave it would be straight forward to just submit the Sweave generated LaTeX file. The potential issue here is that this is not vanilla LaTeX (but, of course, it is not rocket science either for a progressive and open-minded journal) and this could be a problem, particularly if the journals has very specific format and/or formatting requirements and are particularly obsessive-compulsive about it (plenty of journals are). So to Sweave your manuscript (and risk the wrath of the journal), or not to Sweave (and compromise reproducibility), that is the question.
Of course, a simple solution would be to submit the manuscript as a Sweave file and then simply considering any upcoming LaTeX problems not to be your problems to deal with. Or as a colleague put it once, once your manuscript is accepted getting it into print is their problem. As I am starting to write-up the manuscript I remain undecided, but I suspect being a reproducible research/literate programming/Sweave advocate the decision may have already been made for me, now it just needs time to sink in through my thick skull.
Computational sciences has a long way to go before it reaches the level of reproducibility that is taken for granted in empirical research and in mathematics. Or as Roger Peng much more eloquently expresses in his recent Science perspective:
“The field of science will not change overnight, but simply bringing the notion of reproducibility to the forefront and making it routine will make a difference. Ultimately, developing a culture of reproducibility in which it currently does not exist will require time and sustained effort from the scientific community.”
Perhaps my manuscript could be one small contribution towards this goal.
This is from the “Mario’s Entangled Bank” blog ( http://pineda-krch.com ) of Mario Pineda-Krch, a theoretical biologist at the University of Alberta.




Good piece –
I hope to read a followup on the ramifications of your Sweaving your next paper
I have published one Swoven paper: Bolker, B. M., doi:10.1098/rsif.2009.0384. . I don’t remember any details (after trying at PNAS, this was published in J Roy Soc Interface). All my other recent papers have been done in collaboration with non-Sweavers (a bigger obstacle than the editorial offices of journals, I would say). This is likely to be particularly simple because you are probably not going to leave exposed (echo=TRUE) code chunks in your manuscript, and everything else gets converted to “vanilla” LaTeX in the Sweaving process …
(oops, sorry, bib details got swallowed: Bolker et al 2010, http://rsif.royalsocietypublishing.org/content/7/46/811.short )
I’d like to hear what you decide. I have been using LaTeX, R, and Sweave for my work in microscopy, image processing and analysis for about a year now. I have used Sweave for a few documents. However, for my work, the data processing can take a while and I don’t want to re-do the data analysis each time I edit the report. Haven’t yet tried CacheSweave. I tend to use either shell scripts or batch files (I use Linux, MacOSX, and Windows.) I may go to makefiles. But basically, I have a series of scripts to do the analysis and can run as many as I need if things change. I like the idea of a “one script that rules them all” approach that I can use to generate the final document from the get-go as the last step. I also use git and maintain all this with version control.
For the Journal of Statistical Software (http://www.jstatsoft.org/) we do encourage usage of Sweave but do not actually use it directly in the production. For the accepted papers we require submission of .tex and graphics (plus .bib) and a .R file with replication code – and all of that can be conveniently produced from the .Rnw. If authors want to share the original .Rnw as well, we encourage to include it as a vignette in their package. Thus, on a more conceptual level: We publish a static snapshot (that was reviewed) of the dynamic Sweave document.
The reason is that compiling and maintaining the dynamic documents would currently still take more resources than we have. We retain a lot of the reproducibility but lose some “literate” aspects.
Personally, I use the same strategy of submitting static snapshots of Sweave documents for almost all of my papers. If the targeted journal has facilities for sharing replication material already in the reviewing stage, I include code in my submissions. Otherwise I often try to include it in one of my CRAN packages if appropriate. Probably also not the ideal solution but still better than what is available for many “standard” papers.
I think Ben Bolker’s being too modest. If the files posted on his website don’t mislead, he’s published a swoven _book_ as well.
One addition, the paper Ben links too looks like a good read by itself, but I wanted to have a look at the sweave file. There is a “data supplement”, but strangely enough it appears as just the text “R code and data files” (http://rsif.royalsocietypublishing.org/content/early/2009/10/27/rsif.2009.0384/suppl/DC1)
Oops. How about http://www.math.mcmaster.ca/bolker/misc/bolker_jrsi.tgz ? (I know that defeats the purpose; if I get to it I’ll try to nag the JRSI editorial office.)
The complications discussed in this blog are some of the reasons why we decided to develop the Potsdam Mind Research Repository (http://read.psych.uni-potsdam.de/pmr2/ ; see also “About”). All the journals in our field (experimental psychology) have some form of open access policy (usually green way). This gives us the option to publish preprint, postprint, or pdf and associate data and analyses scripts with each peer-reviewed publication (“paper package”). We also are ready to include “paper packages” from other labs working on related topics. We think that such small communities are more likely to engage exchange and re-analyses of published data than large all-purpose archives.
I wrote almost everything in Sweave, which motivated my re-invention of the wheel — the knitr package (http://yihui.github.com/knitr/). Now my answer to your question is definitely yes, since it takes me 10 seconds to set up a document with LyX and knitr from scratch (see this short video http://vimeo.com/32948939), so why not?
I think a broader question is how our journals in this age should change towards a reproducible manner. Traditional papers are published in a “fixed” manner — once submitted, they are “dead”. I believe at least electronic journals like JSS may try to provide an environment like CRAN to rebuild papers periodically from source; original authors should be notified in case of errors. Why software packages must be re-built and re-checked periodically but scientific papers should not?