This lesson is in the early stages of development (Alpha version)

Intro to R and RStudio for Genomics: Glossary

Key Points

Introducing R and RStudio IDE
  • R is a powerful, popular open-source scripting language

  • You can customize the layout of RStudio, and use the project feature to manage the files and packages used in your analysis

  • RStudio allows you to run R in an easy-to-use interface and makes it easy to find help

R Basics
  • Effectively using R is a journey of months or years. Still you don’t have to be an expert to use R and you can start using and analyzing your data with with about a day’s worth of training

  • It is important to understand how data are organized by R in a given object type and how the mode of that type (e.g. numeric, character, logical, etc.) will determine how R will operate on that data.

  • Working with vectors effectively prepares you for understanding how data are organized in R.

R Basics continued - factors and data frames
  • It is easy to import data into R from tabular formats including Excel. However, you still need to check that R has imported and interpreted your data correctly

  • There are best practices for organizing your data (keeping it tidy) and R is great for this

  • Base R has many useful functions for manipulating your data, but all of R’s capabilities are greatly enhanced by software packages developed by the community

Aggregating and Analyzing Data with dplyr
  • Use the dplyr package to manipulate dataframes.

  • Use select() to choose variables from a dataframe.

  • Use filter() to choose data based on values.

  • Use group_by() and summarize() to work with subsets of data.

  • Use mutate() to create new variables.

Data Visualization with ggplot2
Producing Reports With knitr
  • Mix reporting written in R Markdown with software written in R.

  • Specify chunk options to control formatting.

  • Use knitr to convert these documents into PDF and other formats.