The data used for this lesson are in the figshare repository at: https://doi.org/10.6084/m9.figshare.1314459
This lesson uses mostly
combined.csv. The 3 other csv files:
surveys.csv are only needed for the lesson on databases.
combined.csv is downloaded directly in the episode "Starting with Data" and does not need to be downloaded before hand. It however requires that there is a decent internet connection in the room where the workshop is being taught. To facilitate the download process, the chunk of code that includes the URL where the csv file lives, and where the file should go and be named is included in the code handout (see next paragraph). Using this approach ensures that the file will be where the lesson expects it to be, and teaches good/reproducible practice of automating the download. If the learners haven't created the
data/ directory and/or are not in the correct working directory, the
download.file command will produce an error. Therefore, it is important to use the stickies at this point.
The code handout (a link to download it is also available on the top bar of the lesson website) is useful for Data Carpentry workshops. It includes an outline of the lesson content, the text for the challenges, the links for the files that need to be downloaded for the lesson, and pieces of code that may be difficult to type for learners with no programming experience/who are unfamiliar with R's syntax. We encourage you to distribute it to the learners at the beginning of the lesson. As an instructor, we encourage you to do the live coding directly in this file, so the participants can follow along.
With the release of R 4.0.0 in early 2020, an important change has been made to R: The default for
stringsAsFactors is now
FALSE instead of
TRUE. As a result, the
data.frame() functions do not automatically convert character columns to factors anymore (you can read more about this here).
This change should not cause any problems with this lesson, independent of whether R >4.0 is used or not, because we it uses
read_csv() from the
tidyverse package throughout. Other than
read.csv() from base R,
read_csv() never converts character columns to factors, regardless of the R version.
Nevertheless, it is recommended that learners install a version of R ≥4.0.0, and instructors and helpers should be aware of this potential source of error.
Some learners may have previous R installations. On Mac, if a new install is performed, the learner's system will create a symbolic link, pointing to the new install as 'Current.' Sometimes this process does not occur, and, even though a new R is installed and can be accessed via the R console, RStudio does not find it. The net result of this is that the learner's RStudio will be running an older R install. This will cause package installations to fail. This can be fixed at the terminal. First, check for the appropriate R installation in the library;
ls -l /Library/Frameworks/R.framework/Versions/
We are currently using R 4.0.x. If it isn't there, they will need to install it. If it is present, you will need to set the symbolic link to Current to point to the 4.0.x directory:
ln -s /Library/Frameworks/R.framework/Versions/3.6.x /Library/Frameworks/R.framework/Version/Current
Then restart RStudio.
On older versions of MacOS, it may happen that axis labels do not show up when calling
plot() (section "renaming factors" in "Starting with Data"). This issue might be due to the default font Arial being deactivated, so that R cannot find it. To resolve this issue, go to Finder, Search for Font Book and open it. Look for the Arial font and, if it is greyed out, turn it on.
If the problem occurs with
ggplot2 plots, an alternative workaround is to change the default theme for the R session, so that ggplot uses a serif font. Since Arial is a sans-serif font, R will try to load a different font. This can be done with
theme_update(text = element_text(family = "serif")).
Save yourself some aggrevation, and have everyone check and see if they can install all these packages before you start the first day. See the "Preparations" section on the homepage of the course website for package installation instructions.
Sometimes learners are unable to install the
tidyverse package. In that case, they can try to install the individual packages that are actually needed:
install.packages("readr", "lubridate", "dplyr", "tidyr", "ggplot2", "dbplyr")
data_raw(all lowercase) subfolder.
The two main goals for this lessons are:
aes()function, (3) basic customization of the plots.
It maybe worthwhile to mention that we can also specify colors by color HEX code (http://colorbrewer2.org)
ggplot(data = surveys_complete, mapping = aes(x = weight, y = hindfoot_length)) + geom_point(alpha = 0.1, color = "#FF0000")
As it stands, the solutions to all the challenges are commented out in the Rmd files. If you want to double check your answer, you can look at the source code of the Rmd files on GitHub.
Show how to use the 'zoom' button to blow up graphs without constantly resizing windows
Sometimes a package will not install, try a different CRAN mirror - Tools > Global Options > Packages > CRAN Mirror
Alternatively you can go to CRAN and download the package and install from ZIP file - Tools > Install Packages > set to 'from Zip/TAR'
It is important that R, and the R packages be installed locally, not on a network drive. If a learner is using a machine with multiple users where their account is not based locally this can create a variety of issues (This often happens on university computers). Hopefully the learner will realize these issues before hand, but depending on the machine and how the IT folks that service the computer have things set up, it may be very difficult to impossible to make R work without their help.
If learners are having issues with one package, they may have issues with another. Its often easier to make sure they have all the needed packages installed at one time, rather then deal with these issues over and over. Here is a list of all necessary packages for these lessons.
If you encounter a problem during a workshop, feel free to contact the maintainers by email or open an issue.
For a more in-depth coverage of topics of the workshops, you may want to read "R for Data Science" by Hadley Wickham and Garrett Grolemund.