Summary and Setup

Welcome to R! Working with a programming language (especially if it’s your first time) often feels intimidating, but the rewards outweigh any frustrations. An important secret of coding is that even experienced programmers find it difficult and frustrating at times – so if even the best feel that way, why let intimidation stop you? Given time and practice* you will soon find it easier and easier to accomplish what you want.

Why learn to code? Bioinformatics – like biology – is messy. Different organisms, different systems, different conditions, all behave differently. Experiments at the bench require a variety of approaches – from tested protocols to trial-and-error. Bioinformatics is also an experimental science, otherwise we could use the same software and same parameters for every genome assembly. Learning to code opens up the full possibilities of computing, especially given that most bioinformatics tools exist only at the command line. Think of it this way: if you could only do molecular biology using a kit, you could probably accomplish a fair amount. However, if you don’t understand the biochemistry of the kit, how would you troubleshoot? How would you do experiments for which there are no kits?

R is one of the most widely-used and powerful programming languages in bioinformatics. R especially shines where a variety of statistical tools are required (e.g. RNA-Seq, population genomics, etc.) and in the generation of publication-quality graphs and figures. Rather than get into an R vs. Python debate (both are useful), keep in mind that many of the concepts you will learn apply to Python and other programming languages.

Finally, we won’t lie; R is not the easiest-to-learn programming language ever created. So, don’t get discouraged! The truth is that even with the modest amount of R we will cover today, you can start using some sophisticated R software packages, and have a general sense of how to interpret an R script. Get through these lessons, and you are on your way to being an accomplished R user!

* We very intentionally used the word practice. One of the other “secrets” of programming is that you can only learn so much by reading about it. Do the exercises in class, re-do them on your own, and then work on your own problems.

Prerequisites

  • Experimenter’s Mindset: We define the “Experimenter’s mindset” as an approach to bioinformatics that treats it like any other experiment. There are probably a variety of metaphors we could employ (data are our reagents, scripts are our protocols, etc.), but the most important idea of the mindset is to remind you that as a researcher, you need to employ all of your training in the bench or field to working with analyses. Evaluate results critically, and don’t expect that things will always work the first time, or that they will always work in the same way.
  • Genomics Data Carpentry Instance: This lesson assumes you are using a Genomics Data Carpentry instance as described on the Genomics Workshop setup page

This lesson is an additional lesson to the genomics workshop. Below, is a detailed setup instructions for the main workshop which can also be found on the main setup page. If you are only here for the Intro to R and RStudio for Genomics lesson, and do not wish to work on the cloud, you can go for option B below where you will only need to download the data files to your local working directory where you will create the r-project in.

R Genomics workshop setup directions


For general setup for the genomics lessons, see that page of the genomics workshop.

Installing R


This workshop is designed to be run on pre-imaged Amazon Web Services (AWS) instances. With the exception of a spreadsheet program, all of the software and data used in the workshop are hosted on an Amazon. However, it is beneficial to understand how to install both R and RStudio locally (on your own machine).

Typically we suggest you install R for your operating instructions from Comprehensive R Archive Network (CRAN) at this page.

Follow the CRAN instructions for downloading the Windows Binary

Follow the CRAN instructions for downloading the MacOS .pkg file; ensure that you download the package that corresponds to your Mac CPU configuration (e.g. “For Apple silicon (M1/M2) Macs” or “For older Intel Macs”).

Follow the CRAN instructions for your Linux Distribution; The instructions for Ubuntu are found here.

Installing RStudio


The RStudio integrated development environment (IDE) is a popular way to use R. We will cover what RStudio is in the lesson. All operating systems can go to the Posit LLC download pages to download the appropriate version for their operating system. In most cases, the Posit website will detect your operating system and suggest the correct download. This same page will link to alternatives.