### Learning Objectives

Following this assignment students should be able to:

• install and load an R package
• understand the data manipulation functions of `dplyr`
• execute a simple import and analyze data scenario

### Optional Resources:

#### Lecture Notes

Place this code at the start of the assignment to load all the required packages.

``````library(dplyr)
``````

### Exercises

1. #### Shrub Volume Data Basics (20 pts)

Dr. Granger is interested in studying the factors controlling the size and carbon storage of shrubs. She has conducted an experiment looking at the effect of three different treatments on shrub volume at four different locations. She has placed the data file on the web for you to download:

If the file `shrub-volume-data.csv` is not already in your working directory (it probably is if you’re taking this class using RStudio Cloud) then download it into your working directory.

Get familiar with the data by importing it using `read.csv()` and use `dplyr` to complete the following tasks.

1. Select the data from the length column and print it out (using `select`).
2. Select the data from the site and experiment columns and print it out (using `select`).
3. Add a new column named `area` containing the area of the shrub, which is the length times the width (using `mutate`).
4. Sort the data by length (using `arrange`).
5. Filter the data to include only plants with heights greater than 5 (using `filter`).
6. Filter the data to include only plants with heights greater than 4 and widths greater than 2 (using `,` or `&` to include two conditions).
7. Filter the data to include only plants from Experiment 1 or Experiment 3 (using `|` for “or”).
8. Filter the data to remove rows with null values in the `height` column (using `!is.na`)
9. Create a new data frame called `shrub_volumes` that includes all of the original data and a new column containing the volumes (length * width * height), and display it.
Expected outputs for Shrub Volume Data Basics: 1
2. #### Code Shuffle (20 pts)

We are interested in understanding the monthly variation in precipitation in Gainesville, FL. We’ll use some data from the NOAA National Climatic Data Center. Each row of the data is a year (from 1961-2013) and each column is a month (January - December).

Rearrange the following program so that it:

• Imports the data from the web into a data frame
• Calculates the mean precipitation (ppt) in each month across years
• Plots the monthly averages as a simple line plot

Finally, add a comment above the code that describes what it does. The comment character in R is `#`.

It’s OK if you don’t know exactly how the details of the program work at this point, you just need to figure out the right order of the lines based on when variables are defined and when they are used.

``````plot(monthly_mean_ppt, type = "l", xlab = "Month", ylab = "Mean Precipitation")
monthly_mean_ppt <- colMeans(ppt_data)
``````
Expected outputs for Code Shuffle: 1
3. #### Bird Banding (20 pts)

The number of birds banded at a series of sampling sites has been counted by your field crew and entered into the following vector. Counts are entered in order and sites are numbered starting at one. Cut and paste the vector into your assignment and then answer the following questions by using code and printing the result to the screen. Some R functions that will come in handy include `length()`, `max()`, `min()`, `sum()`, and `mean()`.

``````number_of_birds <- c(28, 32, 1, 0, 10, 22, 30, 19, 145, 27,
36, 25, 9, 38, 21, 12, 122, 87, 36, 3, 0, 5, 55, 62, 98, 32,
900, 33, 14, 39, 56, 81, 29, 38, 1, 0, 143, 37, 98, 77, 92,
83, 34, 98, 40, 45, 51, 17, 22, 37, 48, 38, 91, 73, 54, 46,
102, 273, 600, 10, 11)
``````
1. How many sites are there?
2. How many birds were counted at site 42?
3. What is the total number of birds counted across all of the sites?
4. What is the smallest number of birds counted?
5. What is the largest number of birds counted?
6. What is the average number of birds seen at a site?
7. How many birds were counted at the last site? Have the computer choose the last site automatically in some way, not by manually entering its position. Do you know a function that will give you the position of the last value? (since positions start at 1 the position of the last value in a vector is the same as its length).
Expected outputs for Bird Banding: 1
4. #### Portal Data Manipulation (20 pts)

If the file `surveys.csv` is not already in your working directory then download a copy.

Load the file into R using `read.csv()`.

Do not use pipes for this exercise.

1. Use `select()` to create a new data frame with just the `year`, `month`, `day`, and `species_id` columns in that order.
2. Use `mutate()`, `select()`, and `filter()` with `!is.na()` to create a new data frame with the `year`, `species_id`, and weight in kilograms of each individual, with no null weights. The weight in the table is given in grams so you will need to create a new column for weight in kilograms by dividing the weight column by 1000.
3. Use the `filter()` function to get all of the rows in the data frame for the species ID `SH`.
Expected outputs for Portal Data Manipulation: 1
5. #### Portal Data Manipulation Pipes (20 pts)

If the file `surveys.csv` is not already in your working directory then download a copy.

Load the file into R using `read.csv()`.

Use pipes (either `|>` or `%>%`) to combine the following operations to manipulate the data.

1. Use `mutate()`, `select()`, and `filter()` with `is.na()` to create a new data frame with the `year`, `species_id`, and weight in kilograms of each individual, with no null weights.
2. Use the `filter()` and `select()` to get the `year`, `month`, `day`, and `species_id` columns for all of the rows in the data frame where `species_id` is `SH`.
Expected outputs for Portal Data Manipulation Pipes: 1
6. #### Portal Data Challenge (optional)

If the file `surveys.csv` is not already in your working directory then download a copy.

Develop a data manipulation pipeline for the Portal `surveys` table that produces a table of data for only the three Dipodomys species (`DM`, `DO`, `DS`). The species IDs should be presented as lower case, not upper case. The table should contain information on the date, the species ID, the weight and hindfoot length. The data should not include null values for either weight or hindfoot length. The table should be sorted first by the species (so that each species is grouped together) and then by weight, with the largest weights at the top.

Expected outputs for Portal Data Challenge: 1

Assignment submission & checklist