Data Carpentry in an open source project, and we welcome contributions of all kinds: new lessons, fixes to existing material, bug reports, and reviews of proposed changes are all welcome.

Contributor Agreement

By contributing, you agree that we may redistribute your work under our license. In exchange, we will address your issues and/or assess your change proposal as promptly as we can, and help you become a member of our community. Everyone involved in Software Carpentry and Data Carpentry agrees to abide by our code of conduct.

Working With GitHub

NOTE: this repository uses gh-pages as our default branch.

  1. Fork the datacarpentry/R-genomics repository on GitHub. See the “Fork” button in the top-right corner of the screen on the GitHub website.

  2. Clone that repository to your own machine. (It is also possible to make minor edits right on GitHub.) At your terminal:

  3. Create a branch from gh-pages for your changes. Give your branch a meaningful name, such as fix-typos-dplyr-lesson or add-tutorial-on-visualization. At your terminal:

  4. Make your changes to the Rmd file. If you’d like to check the rendered version of your changes, you can do one of three things:

    • if you have GNU Make installed on your system, type make at your shell terminal.
    • if you use RStudio, click on the “Knit” button in the top-right corner of your editor pane.
    • in other cases, you can type: rmarkdown::render_site("01-intro-to-r.Rmd") in your R terminal (make sure your working directory is at the root of the lesson) to generate the corresponding html file.
  5. Commit the Rmd file you edited (git add file-you-changed.Rmd, followed by git commit -m "fix typos in dplyr lesson"), and push your changes to your repository on GitHub (git push origin fix-typos-dplyr-lesson). If your change affects a lesson, please only commit and push the Rmd files. The rendered versions will be generated by the lesson maintainers to avoid merge conflicts.

  6. Send a pull request (PR) to the gh-pages branch of the datacarpentry/R-genomics repository for this lesson at

If it is easier for you to send edits to us some other way, please mail us at Given a choice between you creating content or wrestling with Git, we’d rather have you doing the former.

File Locations and Formats

Each lesson is composed of files such as 00-before-we-start.Rmd, 01-intro-to-r.Rmd and so on. (We use two digits followed by a topic key to ensure files appear in the right order when listed.)

For the R material, lessons must be written in RMarkdown (ending in Rmd). A Makefile converts the Rmd files into HTML that are processed by Jekyll (the tool GitHub uses to create websites) as explained in the README file.

Important Note: We use the purl() function from knitr to generate a skeleton file that contains code to be distributed to the workshop participants. This strategy is useful in particular for error-prone pieces of code (e.g., if it contains long URLs). To take full advantage, every line of code that should be included in the handout must be enclosed in an R code chunk with purl=TRUE in the chunk options. Further, to aid students’ use of the handout code, consider including explanatory comments. When writing Challenges in particular, you may need to include redundant comments and used the chunk option echo=FALSE. In doubt consult the Rmd files for examples.

Images (e.g., screenshots) go into the img/ folder. Graphics generated by some R code also go into this folder and get the prefix R-ecology-. This latter case is handled automatically with some knitr options in the setup.R file.

Raw data go into data/. However, at this stage, this folder is created programmatically and only contain dataset downloaded directly from the figshare repository. In other words, it can be safely be deleted (e.g. using make clean-data or make clean.)

The data_output/ folder only contains data generated/exported by R code.

The site_libs folder is generated by the rmarkdown package and holds the javascript, css, and fonts used by the website.


We don’t store data for lessons inside the lesson repositories. For completed lessons the data should be publicly available in a data repository appropriate to the data type. For lesson development the data may be provided in any way that is convenient including posting to a website, on figshare, a public Dropbox link, a GitHub gist, or even included in the pull request (PR). Once the PR is ready to merge the data should be placed in the official data repository and all links to the data updated.

Formatting of the material

To ensure a consistent formatting of the lessons, we recommend the following guidelines:

  • No trailing white space
  • Wrap lines at 80 characters (unless it breaks URLs)
  • Use consistent capitalization (e.g., R not r, RStudio not rstudio or Rstudio)
  • Function names are written as function() while variables packages are written as variable, and package names as package.
  • Use unclosed atx style headers (see below):
## Use this format for headers

And not this format


Data Carpentry, 2017-2018. License. Contributing.
Questions? Feedback? Please file an issue on GitHub.
On Twitter: @datacarpentry