#### Learning Objectives

Following this assignment students should be able to:

• install and load an R package
• understand the data manipulation functions of `dplyr`
• execute a simple import and analyze data scenario

• Topics

• `dplyr`

• Optional Resources:

1. #### Shrub Volume Data Basics (10 pts)

This is a follow-up to Shrub Volume Data Frame.

Dr. Granger is interested in studying the factors controlling the size and carbon storage of shrubs. This research is part of a larger area of research trying to understand carbon storage by plants. She has conducted a small preliminary experiment looking at the effect of three different treatments on shrub volume at four different locations. She has placed the data file on the web for you to download:

Download this into your `data` folder and get familiar with the data by importing the shrub dimensions data using `read.csv()` and then:

1. Check the column names in the data using the function `names()`.
2. Use `str()` to show the structure of the data frame and its individual columns.
3. Print out the first few rows of the data using the function `head()`.

Use `dplyr` to complete the remaining tasks.

4. Select the data from the length column and print it out.
5. Select the data from the site and experiment columns and print it out.
6. Filter the data for all of the plants with heights greater than 5 and print out the result.
7. Create a new data frame called `shrub_data_w_vols` that includes all of the original data and a new column containing the volumes, and display it.
2. #### Shrub Volume Aggregation (10 pts)

This is a follow-up to Shrub Volume Data Basics.

Dr. Granger wants some summary data of the plants at her sites and for her experiments. Make sure you have her shrub dimensions data.

This code calculates the average height of a plant at each site:

``````by_site <- group_by(shrub_dims, site)
avg_height <- summarize(by_site, avg_height = mean(height))
``````
1. Modify the code to calculate and print the average height of a plant in each experiment.
2. Use `max()` to determine the maximum height of a plant at each site.
3. #### Shrub Volume Join (15 pts)

This is a follow-up to Shrub Volume Aggregation.

Dr. Granger has kept a separate table that describes the `manipulation` for each `experiment`. Add the experiments data to your `data` folder.

Import the experiments data and then use `inner_join` to combine it with the shrub dimensions data to add a `manipulation` column to the shrub data.

4. #### Portal Data Manipulation (25 pts)

Download a copy of the Portal Teaching Database surveys table and load it into R using `read.csv()`.

1. Use `select()` to create a new data frame with just the `year`, `month`, `day`, and `species_id` columns in that order.
2. Use `mutate()`, `select()`, and `na.omit()` to create a new data frame with the `year`, `species_id`, and weight in kilograms of each individual, with no null weights.
3. Use the `filter()` function to get all of the rows in the data frame for the species ID `SH`.
4. Use the `group_by()` and `summarize()` functions to get a count of the number of individuals in each species ID.
5. Use the `group_by()` and `summarize()` functions to get a count of the number of individuals in each species ID in each year.
6. Use the `filter()`, `group_by()`, and `summarize()` functions to get the mean mass of species `DO` in each year.
5. #### Fix the Code (15 pts)

This is a follow-up to Shrub Volume Aggregation. If you haven’t already downloaded the shrub volume data do so now and store it in your `data` directory.

The following code is supposed to import the shrub volume data and calculate the average shrub volume for each site and, separately, for each experiment

``````read.csv("data/shrub-volume-data.csv")
shrub_data %>%
mutate(volume = length * width * height) %>%
group_by(site) %>%
summarize(mean_volume = max(volume))
shrub_data %>%
mutate(volume = length * width * height)
group_by(experiment) %>%
summarize(mean_volume = mean(volume))
``````
1. Fix the errors in the code so that it does what it’s supposed to
2. Add a comment to the top of the code explaining what it does
Load them into R using `read.csv()`.
1. Use `inner_join()` to create a table that contains the information from both the `surveys` table and the `species` table.
2. Use `inner_join()` twice to create a table that contains the information from all three tables.
3. Use `inner_join()` and `filter()` to get a data frame with the information from the `surveys` and `plots` tables where the `plot_type` is `Control`.
4. We want to do an analysis comparing the size of individuals on the `Control` plots to the `Long-term Krat Exclosures`. Create a data frame with the `year`, `genus`, `species`, `weight` and `plot_type` for all cases where the plot type is either `Control` or `Long-term Krat Exclosure`. Only include cases where `Taxa` is `Rodent`. Remove any records where the `weight` is missing.