Contributing

Data Carpentry in an open source project, and we welcome contributions of all kinds: new lessons, fixes to existing material, bug reports, and reviews of proposed changes are all welcome.

Contributor Agreement

By contributing, you agree that we may redistribute your work under our license. In exchange, we will address your issues and/or assess your change proposal as promptly as we can, and help you become a member of our community. Everyone involved in Software Carpentry and Data Carpentry agrees to abide by our code of conduct.

Working with GitHub

Submitting Issues on GitHub

If you have an idea about how to improve the lesson, you can submit it as an issue on GitHub. If you have multiple unrelated suggestions, it is best to open a separate issue for each of them. This makes it easier for the project maintainers to discuss and resolve them.

Submitting an issue can count as a contribution for your instructor training checkout. If your contribution is for instructor training, send an email with a link to the issue to checkout@carpentries.org. Please note that it is not necessary to point out in the issue's title or text that it is a contribution for the instructor training checkout.

Submitting Pull Requests

You can also suggest changes by modifying the lesson code directly and submitting your changes as a pull request.

  1. Fork the datacarpentry/R-ecology-lesson repository on GitHub. See the "Fork" button in the top-right corner of the screen on the GitHub website.

  2. Clone that repository to your own machine. (It is also possible to make minor edits right on GitHub.) At your terminal:

    git clone https://github.com/your_username/R-ecology-lesson.git R-ecology-lesson
    cd R-ecology-lesson
    git remote add upstream https://github.com/datacarpentry/R-ecology-lesson.git
  3. Create a branch from main for your changes. Give your branch a meaningful name, such as fix-typos-dplyr-lesson or add-tutorial-on-visualization. At your terminal:

    git checkout -b fix-typos-dplyr-lesson
  4. Make your changes to the Rmd file. If you'd like to check the rendered version of your changes, you can do one of three things:

    • if you have GNU Make installed on your system, type make at your shell terminal.
    • if you use RStudio, click on the "Knit" button in the top-right corner of your editor pane.
    • in other cases, you can type: rmarkdown::render_site("01-intro-to-r.Rmd") in your R terminal (make sure your working directory is at the root of the lesson) to generate the corresponding html file.
  5. Commit the Rmd file you edited (git add file-you-changed.Rmd, followed by git commit -m "fix typos in dplyr lesson"), and push your changes to your repository on GitHub (git push origin fix-typos-dplyr-lesson). If your change affects a lesson, please only commit and push the Rmd files. The rendered versions will be generated by the lesson maintainers to avoid merge conflicts.

  6. Send a pull request (PR) to the main branch of the datacarpentry/R-ecology-lesson repository for this lesson at https://github.com/datacarpentry/R-ecology-lesson

If you are new to Git or GitHub, software like GitHub Desktop can make this process easier for you.

If it is easier for you to send edits to us some other way, please mail us at checkout@carpentries.org. Given a choice between you creating content or wrestling with Git, we'd rather have you doing the former.

File Locations and Formats

RMarkdown

For the R material, lessons are written in RMarkdown (files ending in Rmd). Filenames follow the pattern 00-before-we-start.Rmd, 01-intro-to-r.Rmd and so on. That is, we use two digits followed by a topic key to ensure files appear in the right order when listed.

A Makefile converts the Rmd files into HTML files that are processed by Jekyll (the tool GitHub uses to create websites) as explained in the README file.

To ensure a consistent formatting of the lessons, we recommend the following formatting guidelines for RMarkdown files:

  • No trailing white space
  • Wrap lines at 80 characters (unless it breaks URLs)
  • Use consistent capitalization (e.g., R not r, RStudio not rstudio or Rstudio)
  • Function names are written as function() while variables are written as variable, and package names as package.
  • Use unclosed atx style headers (see below):
## Use this format for headers

And not this format
-------------------

Formatting RMarkdown Code Chunks

Most R code within .Rmd files is written inside of code chunks. Code chunks can have a name and a number of options, but neither is required. Options are added to a code chunk like this:

```{r, chunk_name, option1 = value, option2 = value, ...}

Throughout the lesson, we use different code chunk options, mostly to change when and how the code in the chunks is being executed. Below you will find a list of the most common options we use and information on how we use them. More information on RMarkdown code chunk options can be found here. When in doubt, consult the Rmd files for examples.

answer = [FALSE | TRUE]

The answer option is used in challenges to hide the content of the chunk so that the reader needs to interact with the website to reveal it. The default value is FALSE.

echo = [FALSE | TRUE]

If echo = FALSE, the code will be executed and its output will be visible on the lesson website (unless specified otherwise by the eval, message, or results options), but the code itself will not be visible. This is useful when writing code for the code handout, because it allows to include redundant headings and comments that are not needed in the lesson itself, but help to structure and clarify the code handout. The default value is TRUE.

eval = [FALSE | TRUE]

If eval = FALSE the code in the chunk will not be executed by R when the file is processed to create the lesson website. Accordingly, no output will be created. This is useful, for example, when seeing the result of the code is not required for the lesson, or when the code chunk contains code that installs or loads packages, downloads files, or opens the R help window. The default value is TRUE.

message = [FALSE | TRUE]

If FALSE messages produced by the code will not be shown. THis is useful, for example when loading packages like tidyverse that output when loaded. By using message = FALSE, such output can be hidden. The default value is TRUE.

purl = [FALSE | TRUE]

Code chunks that have the option purl = TRUE will be included in the code handout (see below). The default value is FALSE.

results = ['markup' | 'hide' | 'asis' | 'hold']

Determines if and how the text output of a code chunk is formatted. Useful values are markup (to format text output using markup, usually formatting it as a code block), asis (to write raw output directly into the document without any markup), and hide (to hide the output, for example when loading data sets).

Code Handout

The code handout code-handout.R contains code that can be distributed to learners. This is particularly useful for error prone code such as long URLs for downloading files. The code handout is created automatically from the lesson's .Rmd files by make_code_handout.R, and we use the purl() function from knitr to
create the handout. Code that should be included in the code handout must be enclosed in an R code chunk with the chunk option purl = TRUE (see above). To make the handout more useful, consider including explanatory comments.

Data

We don't store data for lessons inside the lesson repositories. For completed lessons the data should be publicly available in a data repository appropriate to the data type. For lesson development the data may be provided in any way that is convenient including posting to a website, on figshare, a public Dropbox link, a GitHub gist, or even included in the pull request (PR). Once the PR is ready to merge the data should be placed in the official data repository and all links to the data updated.

Raw data go into data_raw/. However, at this stage, this folder is created programmatically and only contain dataset downloaded directly from the Figshare repository. In other words, it can be safely be deleted (e.g. using make clean-data or make clean.)

The data/ folder only contains data generated/exported by R code.

Images

Images (e.g., screenshots) are stored in the img/ folder. Graphics generated by some R code also go into this folder and get the prefix R-ecology-. This latter case is handled automatically with some knitr options in the setup.R file.

Website Assets

The site_libs folder is generated by the rmarkdown package and holds the javascript, css, and fonts used by the website.

We aim to have our lessons be as self-contained as possible. Images and other external resources should be included in the repository whenever possible.

FAQ

Page built on: 📆 2020-10-22 ‒ 🕢 04:46:10


Data Carpentry, 2014-2019.

License. Contributing.

Questions? Feedback? Please file an issue on GitHub.
On Twitter: @datacarpentry

If this lesson is useful to you, consider subscribing to our newsletter or making a donation to support the work of The Carpentries.