Introducing R and RStudio IDE
|
R is a powerful, popular open-source scripting language
You can customize the layout of RStudio, and use the project feature to manage the files and packages used in your analysis
RStudio allows you to run R in an easy-to-use interface and makes it easy to find help
|
R Basics
|
Effectively using R is a journey of months or years. Still you don’t have to be an expert to use R and you can start using and analyzing your data with with about a day’s worth of training
It is important to understand how data are organized by R in a given object type and how the mode of that type (e.g. numeric, character, logical, etc.) will determine how R will operate on that data.
Working with vectors effectively prepares you for understanding how data are organized in R.
|
Introduction to the example dataset and file type
|
The dataset comes from a real world experiment in E. coli.
Publicly available FASTQ files can be downloaded from NCBI SRA.
Several steps are taken outside of R/RStudio to create VCF files from FASTQ files.
VCF files store variant calls in a special format.
|
R Basics continued - factors and data frames
|
It is easy to import data into R from tabular formats including Excel. However, you still need to check that R has imported and interpreted your data correctly
There are best practices for organizing your data (keeping it tidy) and R is great for this
Base R has many useful functions for manipulating your data, but all of R’s capabilities are greatly enhanced by software packages developed by the community
|
Using packages from Bioconductor
|
Bioconductor is an alternative package repository for bioinformatics packages.
Installing packages from Bioconductor requires a new method, since it is not compatible with the install.packages() function used for CRAN.
Check Bioconductor to see if there is a package relevant to your analysis before writing code yourself.
|
Data Wrangling and Analyses with Tidyverse
|
Use the dplyr package to manipulate data frames.
Use glimpse() to quickly look at your data frame.
Use select() to choose variables from a data frame.
Use filter() to choose data based on values.
Use mutate() to create new variables.
Use group_by() and summarize() to work with subsets of data.
|
Data Visualization with ggplot2
|
|
Producing Reports With knitr
|
Keep reporting and R software together in one document using R Markdown.
Control formatting using chunk options.
knitr can convert R Markdown documents to PDF and other formats.
|
Getting help with R
|
R provides thousands of functions for analyzing data, and provides several way to get help
Using R will mean searching for online help, and there are tips and resources on how to search effectively
|