Mario’s Entangled Bank

Distributed Open Notebook Science

August 12, 2008 · 2 Comments

One of the interesting concept that emerged during SciFoo, i.e. one that I actually was able to comprehend (sorry, Garrett, your 2 dimensional projection of the 7 dimensional particle space didn’t  not manage to penetrate my skull, but the pictures were darn pretty), is the idea of the distributed Open Notebook science. This is basically Open Notebook science with a twist (perhaps unintentionally) following in the spirit of distributed Source Code Management (SCM) systems such as Git where one has a client side fully functional implementation of the tool that can easily be distributed and shared among other clients and servers. Garrett Lisi, one of the SciFoo’ers, is using the open source TiddlyWiki as an open notebook  for his work in theoretical physics. The neat thing about distributed tools like Git or TiddlyWiki, is the redundancy and flexibility it provides. It is redundant because there is no vulnerable master copy – anyone that downloads a working copy has the fully-functional tool and complete contents at his disposal on the client side. It’s flexible because  it is not dependent on network access or a central server, which also means that it is likely to be very fast. A lot more could be said in favor of such distributed tools, Linus Torvalds has a talk about the virtues of his SCM system Git here (the venue may look familiar to those that attended SciFoo). Inspired by Garrett’s Open Notebook site Deferential Geometry I am antsy to explore if TiddlyWiki could work for me.

This is from the “Mario’s Entangled Bank” blog ( http://pineda-krch.com ) of Mario Pineda-Krch, a theoretical biologist at the University of California, Davis.

Categories: Git · Linus Torvalds · Open Notebook science · Science Foo

2 responses so far ↓

  • Ralph Giles // August 13, 2008 at 9:34 am | Reply

    This is indeed very exciting! Is anyone actually using a DVCS for their data?

    Garrett’s site looks like a great way to share a notebook. From what I can tell, TiddlyWiki isn’t a distributed system per se, although there’s some experimental work in that direction. It can pickle itself into a single file which can be used in a DVCS of course.

    I suspect the pull/marshall/push discipline of distributed version control systems is too much for this sort of notebook. It’s best for building on top of someone else’s work, for example doing new or revised analysis on a dataset, or a updating a living review article. But for sharing thoughts and ideas, something like a feed reader inside TiddlyWiki would be better, so you could read, and easily import, updates from other notebooks.

  • Bill Flanagan // August 15, 2008 at 11:01 am | Reply

    One side effect of pickling is that at the end of the process a consumer now possesses “a pickle”. When a researcher loses their iPhone or accidentally deletes the html file sitting on their desktop, they, and those supporting them, are then, “in a pickle”.

    I’m not a biologist but I develop OpenWetWare for them. Many of the lab notebooks we host are used by teams. MediaWiki is our platform and, among other things, is first and foremost a document manager.

    The software ideally focuses on individual pages and not upon sets of pages. At OpenWetWare, we have constructed tools and tempates to facilitate managing document hierarchies to support our researchers: they see their lab notebooks as a more-or-less unified set of pages.

    This is 180 degrees from Git’s notion of managing revisions of a project and not to focus on individual files. In our world, that would be a ‘lab notebook’ for a project that a set of our members were participating in. I like the idea of knowing what’s going on with the project a lot. But we can never lose sight of any single contribution; data is data and we really need to preserve it

    Part of the reason for using a lab notebook based upon MediaWiki is that not only the current state of a document is accessible but all of the revisions to it are as well.

    This may present concerns for some folks: we hear that people don’t want to put things into our lab notebooks that they do not know to be 100% accurate since their contributions will become part of the collective record of the project. The notebook can be, and is often a record.

    But if creating an official record does not directly lead to an individual researcher being a better researcher, it won’t be seen as a useful tool. It reminds me of companies who keep “2 sets of books”: one formal for tax purposes and the other that accurately but privately represents the real value of the company.

    We’re thinking that a structure similar to the way WordPress handles writing blog posts may be worth considering. Create a document, save the changes until you’re happy with it, then publish it. The act of publishing is the ‘commit’ that would become part of the record. In the interim, the researcher has a safe and secure way to enter data that will not go away. They could save their drafts and have access to multiple checkpoints on their way to completing a protocol or doing data analysis.

    By doing this, it would perhaps help alleviate another problem we see. People keep a document open for a long time knowing that when they save, their changes become accessible to all. This can also be a path to pickledom: connections can come and go. And so can sessions.

    If we were to support this ‘private until published’ revision model, I’m curios as to how it would be received.

    Thanks.

Leave a Comment