[GRASS-user] Open Peer-Review Reproducible Publication with Org and GRASS

Hi Org and GRASS lists,

I just wanted to let these two lists know that I've just posted a paper written in Org and using GRASS (text-mode) and Python for the analysis. My goal was to create not just an open access publication, but a fully reproducible publication. This is an early announcement, and the paper may not pass peer review.

The Supplemental Material is the Org file with all the code to generate the document, beginning with downloading the 3rd party data that is input to our analysis, the GRASS code to perform the analysis, and the Python code to regenerate the figures.

I don't think I did a great job on the reproducible part because I have a highly customized .emacs, etc. All the information necessary to replicate the work should be in the Supplemental Material, but it might not be easy to do so. Anyway, I think it is a step in the right direction.

To make it easier to reproduce... including my emacs.org seems overkill. Including a Virtual Machine that contains everything, including my ~/.emacs.d/ and all the software and data seems like the right thing to do, but journals don't want to host a 20 GB VM with the publication.

Thanks to people on these two lists who have developed the software and helped me use it.

   -k.
   
http://www.the-cryosphere-discuss.net/tc-2016-113/

On Fri, Jun 3, 2016 at 4:19 PM, Ken Mankoff <mankoff@gmail.com> wrote:

Hi Org and GRASS lists,

I just wanted to let these two lists know that I've just posted a paper written in Org and using GRASS (text-mode) and Python for the analysis. My goal was to create not just an open access publication, but a fully reproducible publication. This is an early announcement, and the paper may not pass peer review.

The Supplemental Material is the Org file with all the code to generate the document, beginning with downloading the 3rd party data that is input to our analysis, the GRASS code to perform the analysis, and the Python code to regenerate the figures.

I don't think I did a great job on the reproducible part because I have a highly customized .emacs, etc. All the information necessary to replicate the work should be in the Supplemental Material, but it might not be easy to do so. Anyway, I think it is a step in the right direction.

To make it easier to reproduce... including my emacs.org seems overkill. Including a Virtual Machine that contains everything, including my ~/.emacs.d/ and all the software and data seems like the right thing to do, but journals don't want to host a 20 GB VM with the publication.

...this is quite cool!

How about distributing it as a docker image?

Thanks to people on these two lists who have developed the software and helped me use it.

   -k.

http://www.the-cryosphere-discuss.net/tc-2016-113/

Markus

On Sat, Jun 11, 2016 at 9:41 AM, Markus Neteler <neteler@osgeo.org> wrote:

> To make it easier to reproduce... including my emacs.org seems
overkill. Including a Virtual Machine that contains everything, including
my ~/.emacs.d/ and all the software and data seems like the right thing to
do, but journals don't want to host a 20 GB VM with the publication.

...this is quite cool!

Very cool. It is for a publication (of any kind) by itself.

How about distributing it as a docker image?

Dockerfile or Vagrantfile aim to solve the distribution of the environment
without distributing the binaries.

On Fri, Jun 3, 2016 at 4:19 PM, Ken Mankoff <mankoff@gmail.com> wrote:
How about distributing it as a docker image?

A lot of people have suggested Docker. I'm thinking about reproducibility on the decadal timeframe more than next month or next year. Perhaps that is the wrong focus? Anyway, I don't trust Docker on these timescales.

I have a Windows 98 Virtual Machine that is ~10 years old that I've carried from computer to computer many times. Now that VirtualBox, VMWare, and others can share VMs, and have done so for a long time, that seems like a long-term approach.

I doubt the docker commands on OS X (which seems only partially supported) that I need to run today will work in 1-10 years.

I've also realized the Org part of the document is irrelevant. You don't need to use Emacs, Org, and Babel to reproduce it. You can cut-and-paste the grass code sections into a terminal. Emacs + Org is a much higher barrier to reproduction that "install bash and grass 7.0.3". I'll point this out in the revisions. Cut-and-paste is good enough for now, although a single "make" command is still the goal. Perhaps for the next paper...

  -k.

On Sun, Jun 12, 2016 at 2:28 AM, Ken Mankoff <mankoff@gmail.com> wrote:

On Fri, Jun 3, 2016 at 4:19 PM, Ken Mankoff <mankoff@gmail.com> wrote:
How about distributing it as a docker image?

A lot of people have suggested Docker. I'm thinking about reproducibility on the decadal timeframe more than next month or next year. Perhaps that is the wrong focus? Anyway, I don't trust Docker on these timescales.

Here an example what has been done

Topic: Amazon Forest Green-Up During 2005 Drought
Science article: http://science.sciencemag.org/content/318/5850/612

Others reproduced the computations:

Reproduction of the computations on the article "Amazon Forest
Green-Up During 2005 Drought"
https://github.com/albhasan/amazonGreenUp2005

The git/docker repo itself is tiny, it basically installs itself,
downloads the needed data, processes them, and runs the computations.
Quite interesting.

In the GRASS GIS case and maybe also your case, a shell/Python script
to batch process the data might do the same as you say.

Above just for illustration.

Best
Markus