Reproducibility Toolkit

Overview

Teaching: 30 min
Exercises: 10 min
Questions
  • What tools will we be using?

  • How can we use these tools to improve reproducibility?

Objectives
  • Learn to use R, RStudio and RMarkdown.

Our reproducibility toolkit

R + RStudio

Why R?

Why not language X?

Once in R

You could just type into the command prompt…

A better solution

R Packages

Packages are the fundamental units of reproducible R code. They include reusable R functions, the documentation that describes how to use them, and (often) sample data. (From: http://r-pkgs.had.co.nz)

install.packages("tidyverse")

Demo

Goals of the demo:

R Markdown demo

Open intro-template.Rmd

Click on Knit HTML to compile the document

Important features:

Extending the analysis

Great news!? We just received some more data, in bits and pieces of course: gapminder-7080.csv

gapminder-90plus.csv

Let’s walk through generation of new plots for the 1970s and 1980s and 1990s plus (these new analyses are already in the intro-tutorial.Rmd document).

Note that all code required to accomplish these tasks is also in the template. You do not need to come up with the R code, knit the document to combine the datasets and you’ll see that the code required for recreating the plots is the same as above. That’s the beauty of RMarkdown!

Take aways

Reproducibility checklist

  • Serves as a tool to help you think about the reproducibility of your data analysis.
  • Many of the questions can be thought of as having a yes/no answer.
  • A better approach would be to see the questions as being open ended with the real question being, “What can I do to improve the status of my project on this bullet point?”
  • With that in mind, you’ll never get 100% of the bullets right for your project, but you’ll always be improving.

Key Points

  • R, RStudio and RMarkdown allow for powerful reproducible research.