Learning Objectives

Following this assignment students should be able to:

• install and load an R package
• understand the data manipulation functions of `dplyr`
• execute a simple import and analyze data scenario

Optional Resources:

Lecture Notes

Place this code at the start of the assignment to load all the required packages.

``````library(dplyr)
``````

Exercises

1. Shrub Volume Aggregation (10 pts)

This is a follow-up to Shrub Volume Data Basics.

Dr. Morales wants some summary data of the plants at her sites and for her experiments. If the file shrub-volume-data.csv is not already in your work space download it.

This code calculates the average height of a plant at each site:

``````shrub_dims <- read.csv('shrub-volume-data.csv')
by_site <- group_by(shrub_dims, site)
avg_height <- summarize(by_site, avg_height = mean(height))
``````
1. Modify the code to calculate and print the average height of a plant in each experiment.
2. Add a line of code to use `max()` to determine the maximum height of a plant at each site.
Expected outputs for Shrub Volume Aggregation: 1
2. Shrub Volume Join (10 pts)

This is a follow-up to Shrub Volume Aggregation.

In addition to the main data table on shrub dimensions, Dr. Morales has two additional data tables. The first describes the manipulation for each experiment. The second provides information about the different sites. Check if the files `shrub-volume-experiments.csv` and `shrub-volume-sites.csv` are in your work space (your instructor may have already added them). If not download the experiments data and the sites data.

1. Import the experiments data and then use `inner_join` to combine it with the shrub dimensions data to add a `manipulation` column to the shrub data.
2. Import the sites data and then combine it with both the data on shrub dimensions and the data on experiments to produce a single data frame that contains all of the data.
Expected outputs for Shrub Volume Join: 1
3. Portal Data Aggregation (10 pts)

If the file surveys.csv is not already in your working directory download it.

Load `surveys.csv` into R using `read.csv()`.

1. Use the `group_by()` and `summarize()` functions to get a count of the number of individuals in each species ID.
2. Use the `group_by()` and `summarize()` functions to get a count of the number of individuals in each species ID in each year.
3. Use the `filter()`, `group_by()`, and `summarize()` functions to get the mean mass of species `DO` in each year.
Expected outputs for Portal Data Aggregation: 1
4. Fix the Code (15 pts)

This is a follow-up to Shrub Volume Aggregation. If you donâ€™t already have the shrub volume data in your working directory download it.

The following code is supposed to import the shrub volume data and calculate the average shrub volume for each site and, separately, for each experiment.

``````read.csv("shrub-volume-data.csv")
shrub_data |>
mutate(volume = length * width * height) |>
group_by(site) |>
summarize(mean_volume = max(volume))
shrub_data |>
mutate(volume = length * width * height)
group_by(experiment) |>
summarize(mean_volume = mean(volume))
``````
1. Fix the errors in the code so that it does what itâ€™s supposed to
2. Add a comment to the top of the code explaining what it does
Expected outputs for Fix the Code: 1
5. Portal Data Joins (15 pts)

If surveys.csv, species.csv, and plots.csv are not available in your workspace download them:

Load them into R using `read.csv()`.

1. Use `inner_join()` to create a table that contains the information from both the `surveys` table and the `species` table.
2. Use `inner_join()` twice to create a table that contains the information from all three tables.
3. Use `inner_join()` and `filter()` to get a data frame with the information from the `surveys` and `plots` tables where the `plot_type` is `Control`.
Expected outputs for Portal Data Joins: 1
6. Portal Data dplyr Review (20 pts)

If surveys.csv, species.csv, and plots.csv are not available in your workspace download them:

Load them into R using `read.csv()`.

We want to do an analysis comparing the size of individuals on the `Control` plots to the `Long-term Krat Exclosures`. Create a data frame with the `year`, `genus`, `species`, `weight` and `plot_type` for all cases where the plot type is either `Control` or `Long-term Krat Exclosure`. Only include cases where `Taxa` is `Rodent`. Remove any records where the `weight` is missing.

Expected outputs for Portal Data dplyr Review: 1
7. Extracting vectors from data frames (10 pts)

Using the Portal data `surveys` table (download a copy if itâ€™s not in your working directory):

1. Use `\$` to extract the `weight` column into a vector
2. Use `[]` to extract the `month` column into a vector
3. Extract the `hindfoot_length` column into a vector and calculate the mean hindfoot length ignoring null values.
Expected outputs for Extracting vectors from data frames: 1
8. Building data frames from vectors (10 pts)

You have data on the length, width, and height of 10 individuals of the yew Taxus baccata stored in the following vectors:

``````length <- c(2.2, 2.1, 2.7, 3.0, 3.1, 2.5, 1.9, 1.1, 3.5, 2.9)
width <- c(1.3, 2.2, 1.5, 4.5, 3.1, NA, 1.8, 0.5, 2.0, 2.7)
height <- c(9.6, 7.6, 2.2, 1.5, 4.0, 3.0, 4.5, 2.3, 7.5, 3.2)
``````

Make a data frame that contains these three vectors as columns along with a `genus` column containing the name Taxus on all rows and a `species` column containing the word baccata on all rows.

Expected outputs for Building data frames from vectors: 1

Assignment submission & checklist