Introduction to R and RStudio

  • R is a programming language and software used to run commands in that language
  • RStudio is software to make it easier to write and run code in R
  • Use R Projects to keep your work organized and self-contained
  • Write your code in scripts for reproducibility and portability

Data visualization with ggplot2

  • the ggplot() function initiates a plot, and geom_ functions add representations of your data
  • use aes() when mapping a variable from the data to a part of the plot
  • use scale_ functions to modify the scales used to represent variables
  • use premade theme_ functions to broadly change appearance, and the theme() function to fine-tune
  • start simple and build your plots iteratively

Exploring and understanding data

  • functions like head(), str(), and summary() are useful for exploring data.frames
  • most things in R are vectors, vectors stitched together, or functions
  • make sure to use class() to check vector types, especially when using new functions
  • factors can be useful, but behave differently from character vectors

Working with data

  • use filter() to subset rows and select() to subset columns
  • build up pipelines one step at a time before assigning the result
  • it is often best to keep components of dates separate until needed, then use mutate() to make a date column
  • group_by() can be used with summarize() to collapse rows or mutate() to keep the same number of rows
  • pivot_wider() and pivot_longer() are powerful for reshaping data, but you should plan out how to use them thoughtfully