Learning Objectives

Demonstration of what we will learn to do

Before we get started

Your working directory should now look like this:

How it should look like at the beginning of this lesson

Basics of R

R is a versatile, open source programming/scripting language that’s useful both for statistics but also data science. Inspired by the programming language S.

Presentation of RStudio

Let’s start by learning about our tool. RStudio is an interactive environment for running R.

Your instructor will demonstrate the standard windows:

You can work directly in the Console, where you can type in R code, run it immediately, and see the output. However it is a good idea to create an R script, to make your code reproducible. Our end goal is not just to “do stuff” but to do it in a way that anyone can easily and exactly replicate our workflow and results.

Good practices

There are two main ways of interacting with R: using the console or by using script files (plain text files that contain your code).

The recommended approach when working on a data analysis project is dubbed “the source code is real”. The objects you are creating should be seen as disposable as they are the direct realization of your code. Every object in your analysis can be recreated from your code, and all steps are documented. Therefore, it is best to enter as little commands as possible in the R console. Instead, all code should be written in script files, and evaluated from there. That is where RStudio is really useful as it makes the passing of code between your script and the R console easy. The R console should be used to inspect objects, test a function or get help. With this approach, the .Rhistory file automatically created during your session should not be very useful.

Similarly, you should separate the original data (raw data) from intermediate datasets that you may create for the need of a particular analysis. For instance, you may want to create a data/ directory within your working directory that stores the raw data, and have a data_output/ directory for intermediate datasets and a figure_output/ directory for the plots you will generate.

Seeking help

I know the name of the function I want to use, but I’m not sure how to use it

If you need help with a specific function, let’s say barplot(), you can type:

?barplot

If you just need to remind yourself of the names of the arguments, you can use:

args(lm)

If the function is part of a package that is installed on your computer but don’t remember which one, you can type:

??useMart

I want to use a function that does X, there must be a function for it but I don’t know which one…

If you are looking for a function to do a particular task, you can use help.search() (but only looks through the installed packages):

help.search("kruskal")

If you can’t find what you are looking for, you can use the rdocumention.org website that search through the help files across all packages available.

We will get into more detail on getting help on more complex problems and problem solving at the end of the class.

Next: Introduction to R