## Learning Objectives

• Articulating motivations for this lesson
• Introduce participants to the RStudio interface
• Set up participants to have a working directory with a data/ folder inside
• Introduce R syntax
• Point to relevant information on how to get help, and understand how to ask well formulated questions

# Before we get started

• Start RStudio (your instructor will demonstrate)
• Under the File menu, click on New project, choose New directory, then Empty project
• Enter a name for this new folder, and choose a convenient location for it. This will be your working directory for the rest of the day (e.g., ~/data-carpentry)
• Click on “Create project”
• Under the Files tab on the right of the screen, click on New Folder and create a folder named data within your newly created working directory. (e.g., ~/data-carpentry/data)
• Create a new R script (File > New File > R script) and save it in your working directory (e.g. data-carpentry-script.R)

Your working directory should now look like this:

# Basics of R

R is a versatile, open source programming/scripting language that’s useful both for statistics but also data science. Inspired by the programming language S.

• Open source software under GPL.
• Superior (if not just comparable) to commercial alternatives. R has over 7,000 user contributed packages at this time. It’s widely used both in academia and industry.
• Available on all platforms.
• Not just for statistics, but also general purpose programming.
• Is object oriented and functional.
• Large and growing community of peers.

# Presentation of RStudio

Let’s start by learning about our tool. RStudio is an interactive environment for running R.

Your instructor will demonstrate the standard windows:

• Console
• Scripts
• Environment/History
• Files/Plots/Packages/Help

You can work directly in the Console, where you can type in R code, run it immediately, and see the output. However it is a good idea to create an R script, to make your code reproducible. Our end goal is not just to “do stuff” but to do it in a way that anyone can easily and exactly replicate our workflow and results.

## Good practices

There are two main ways of interacting with R: using the console or by using script files (plain text files that contain your code).

The recommended approach when working on a data analysis project is dubbed “the source code is real”. The objects you are creating should be seen as disposable as they are the direct realization of your code. Every object in your analysis can be recreated from your code, and all steps are documented. Therefore, it is best to enter as little commands as possible in the R console. Instead, all code should be written in script files, and evaluated from there. That is where RStudio is really useful as it makes the passing of code between your script and the R console easy. The R console should be used to inspect objects, test a function or get help. With this approach, the .Rhistory file automatically created during your session should not be very useful.

Similarly, you should separate the original data (raw data) from intermediate datasets that you may create for the need of a particular analysis. For instance, you may want to create a data/ directory within your working directory that stores the raw data, and have a data_output/ directory for intermediate datasets and a figure_output/ directory for the plots you will generate.

## Seeking help

### I know the name of the function I want to use, but I’m not sure how to use it

If you need help with a specific function, let’s say barplot(), you can type:

?barplot

If you just need to remind yourself of the names of the arguments, you can use:

args(lm)

If the function is part of a package that is installed on your computer but don’t remember which one, you can type:

??useMart

### I want to use a function that does X, there must be a function for it but I don’t know which one…

If you are looking for a function to do a particular task, you can use help.search() (but only looks through the installed packages):

help.search("kruskal")

If you can’t find what you are looking for, you can use the rdocumention.org website that search through the help files across all packages available.

We will get into more detail on getting help on more complex problems and problem solving at the end of the class.

Next: Introduction to R