Repeating Things 1

Learning Objectives

Following this assignment students should be able to:

use and create vectorized functions

use the apply family of functions for iteration

integrate custom functions with dplyr for iteration

Reading

Topics
- Iteration
- Style
Readings
- Introduction to apply, lapply, sapply, and tapply
- Hadley Wickham’s Style Guide

Lecture Notes

Place this code at the start of the assignment to load all the required packages.

library(dplyr)
library(ggplot2)

Exercises

Size Estimates Vectorized (25 pts)

This is a followup to Use and Modify.
1. Write a function named mass_from_length_theropoda() that takes length as an argument to get an estimate of mass values for the dinosaur Theropoda. Use the equation mass <- 0.73 * length^3.63. Copy the data below into R and pass the entire vector to your function to calculate the estimated mass for each dinosaur.
  
  theropoda_lengths <- c(17.8013631070471, 20.3764452071665, 14.0743486294308, 25.65782386974, 26.0952008049675, 20.3111541103134, 17.5663244372533, 11.2563431277577, 20.081903202614, 18.6071626441984, 18.0991894513166, 23.0659685685892, 20.5798853467837, 25.6179254233558, 24.3714331573996, 26.2847248252537, 25.4753783544473, 20.4642089867304, 16.0738256364701, 20.3494171706583, 19.854399305869, 17.7889814608919, 14.8016421998303, 19.6840911485379, 19.4685885050906, 24.4807784966691, 13.3359960054899, 21.5065994598917, 18.4640304608411, 19.5861532398676, 27.084751999756, 18.9609366301798, 22.4829168046521, 11.7325716149514, 18.3758846100456, 15.537504851634, 13.4848751773738, 7.68561192214935, 25.5963348603783, 16.588285389794)
2. Create a new version of the function named mass_from_length() to use the equation mass <- a * length^b and take length, a and b as arguments. In the function arguments, set the default values for a to 0.73 and b to 3.63. If you run this function with just the length data from Part 1, you should get the same result as Part 1. Copy the data below into R and call your function using the vector of lengths from Part 1 (above) and these vectors of a and b values to estimate the mass for the dinosaurs using different values of a and b.
  
  a_values <- c(0.759, 0.751, 0.74, 0.746, 0.759, 0.751, 0.749, 0.751, 0.738, 0.768, 0.736, 0.749, 0.746, 0.744, 0.749, 0.751, 0.744, 0.754, 0.774, 0.751, 0.763, 0.749, 0.741, 0.754, 0.746, 0.755, 0.764, 0.758, 0.76, 0.748, 0.745, 0.756, 0.739, 0.733, 0.757, 0.747, 0.741, 0.752, 0.752, 0.748)
  
  b_values <- c(3.627, 3.633, 3.626, 3.633, 3.627, 3.629, 3.632, 3.628, 3.633, 3.627, 3.621, 3.63, 3.631, 3.632, 3.628, 3.626, 3.639, 3.626, 3.635, 3.629, 3.642, 3.632, 3.633, 3.629, 3.62, 3.619, 3.638, 3.627, 3.621, 3.628, 3.628, 3.635, 3.624, 3.621, 3.621, 3.632, 3.627, 3.624, 3.634, 3.621)
3. Create a data frame for this data using dino_data <- data.frame(theropoda_lengths, a_values, b_values). Use dplyr to add a new masses column to this data frame (using mutate() and your function) and print the result to the console.
Expected outputs for Size Estimates Vectorized: 1
Size Estimates With Maximum (25 pts)

This is a followup to Part 1 Size Estimates Vectorized.

Create a new version of your mass_from_length_theropoda() function from Part 1 of Size Estimates Vectorized called mass_from_length_max(). This function should only calculate a mass if the value of length passed to the function is less than 20. If length is greater than 20 return NA instead. Use sapply() and this new function to estimate the mass for the theropoda_lengths data from Size Estimates Vectorized.
Expected outputs for Size Estimates With Maximum: 1
Size Estimates By Name Apply (25 pts)

This is a followup to Size Estimates by Name.

If the data on dinosaur lengths with species names is not in your working directory then download it. Import it using read.csv().

Remember the general form of the equation is:

mass <- a * length ^ b

Create a function get_mass_from_length_by_name() that takes two arguments, the length and the name of the dinosaur group. Inside this function use if/else if/else statements to check to see if the name is one of the following values and if so use the associated a and b values to estimate the species mass using these equations:
- Stegosauria: mass = 10.95 * length ^ 2.64 (Seebacher 2001)
- Theropoda: mass = 0.73 * length ^ 3.63 (Seebacher 2001)
- Sauropoda: mass = 214.44 * length ^ 1.46 (Seebacher 2001)
If the name is not any of these values the function should return NA.
1. Use this function and mapply() to calculate the estimated mass for each dinosaur. You’ll need to pass the data to mapply() as single vectors or columns, not the whole data frame.
2. Using dplyr, add a new masses column to the data frame (using rowwise(), mutate() and your function) and print the result to the console.
3. Using ggplot, make a histogram of dinosaur masses with one subplot for each species (using facet_wrap()).
Expected outputs for Size Estimates By Name Apply: 1 2
Crown Volume Calculation (25 pts)

The UHURU experiment in Kenya has conducted a survey of Acacia and other tree species in ungulate exclosure treatments. Data for the tree data is available here in a tab delimited ("\t") format. Each of the individuals surveyed were measured for tree height (HEIGHT) and canopy size in two directions (AXIS_1 and AXIS_2). Read these data in using the following code:
```
tree_data <- read.csv("https://ndownloader.figshare.com/files/5629536",
                 sep = '\t',
                 na.strings = c("dead", "missing", "MISSING",
                                "NA", "?", "3.3."))
```
You want to estimate the crown volumes for the different species and have developed equations for species in the Acacia genus:
```
volume = 0.16 * HEIGHT^0.8 * pi * AXIS_1 * AXIS_2
```
and the Balanites genus:
```
volume = 1.2 * HEIGHT^0.26 * pi * AXIS_1 * AXIS_2
```
For all other genera you’ll use a general equation developed for trees:
```
volume = 0.5 * HEIGHT^0.6 * pi * AXIS_1 * AXIS_2
```
1. Write a function called tree_volume_calc that calculates the canopy volume for the Acacia species in the dataset. To do so, use an if statement in combination with the str_detect() function from the stringr R package. The code str_detect(SPECIES, "Acacia") will return TRUE if the string stored in this variable contains the word “Acacia” and FALSE if it does not. This function will have to take the following arguments as input: SPECIES, HEIGHT, AXIS_1, AXIS_2. Then run the following line:
  
  tree_volume_calc("Acacia_brevispica", 2.2, 3.5, 1.12)
2. Expand this function to additionally calculate canopy volumes for other types of trees in this dataset by adding if/else statements and including the volume equations for the Balanites genus and other genera. Then run the following lines:
  
  tree_volume_calc("Balanites", 2.2, 3.5, 1.12) tree_volume_calc("Croton", 2.2, 3.5, 1.12)
3. Now get the canopy volumes for all the trees in the tree_data dataframe and add them as a new column to the data frame. You can do this using tree_volume_calc() and either mapply() or using dplyr with rowwise and mutate.
Expected outputs for Crown Volume Calculation: 1 2 3
Tree Biomass Challenge (optional)

Understanding the total amount of biomass (the total mass of all individuals) in forests is important for understanding the global carbon budget and how the earth will respond to increases in carbon dioxide emissions.

We don’t normally measure the mass of a tree, but take a measurement of the diameter or circumference of the trunk and then estimate mass using equations like M = 0.124 * D^2.53.

1. Estimate tree biomass for each species in a 96 hectare area of the Western Ghats in India using the following steps.
- If the file ramesh2010-macroplots.csv isn’t already in your workspace then download a copy.
- Load the data into R.
- Write a function that takes a vector of tree diameters as an argument and returns a vector of tree masses.
- Create a dplyr pipeline that
  - Adds a new column (using mutate and your function) that contains masses calculated from the diameters
  - Groups the data frame into species using the SpCode column
  - And then calculates biomass (i.e., the sum of the masses) for each species (using summarize)
  - Stores the result as a data frame
- Display the resulting data frame
2. Plot a histogram of the species biomass values you just calculated.
- Use 10 bins in the histogram (using the bins argument)
- Use a log10 scale for the x axis (using scale_x_log10)
- Change the x axis label to Biomass and the y axis label to Number of Species (using labs)
Expected outputs for Tree Biomass Challenge: 1 2
Tree Growth (optional)

The UHURU experiment in Kenya has conducted a survey of Acacia and other tree species in ungulate exclosure treatments. Each of the individuals surveyed were measured for tree height (HEIGHT), circumference (CIRC) and canopy size in two directions (AXIS_1 and AXIS_2). If the file TREE_SURVEYS.txt isn’t already in your working directory, download the data file here.

Read the data in using the following code:
```
tree_data <- read.csv("https://ndownloader.figshare.com/files/5629536",
                 sep = '\t',
                 na.strings = c("dead", "missing", "MISSING",
                                "NA", "?", "3.3."))
```
1. Write a function named get_growth() that takes two inputs, a vector of sizes and a vector of years, and calculates the average annual growth rate. Pseudo-code for calculating this rate is (size_in_last_year - size_in_first_year) / (last_year - first_year). Test this function by running get_growth(c(40.2, 42.6, 46.0), c(2020, 2021, 2022)).
2. Use dplyr and this function to get the growth for each individual tree along with information about the TREATMENT that tree occurs on. Trees are identified by a unique value in the ORIGINAL_TAG column. Don’t include information for cases where a TREATMENT is not known (e.g., where it is NA).
3. Using ggplot the output from (2) make a histogram of growth rates for each TREATMENT, which each TREATMENT in it’s own facet. Use geom_vline() to add a vertical line at 0 to help indicate which trees are getting bigger vs. smaller. Include good axis labels.
4. Create a single function called compare_growth() that combines your work in (2) and (3). It should take the arguments:df (the data frame being used), measure (the column that contains the size measurement to measure growth on; we used CIRC), tag_column (the name of the column with the unique tag; we used ORIGINAL_TAG), sample_column (the name of the column indicating different samples, we used YEAR), and facet_column (the name of the column to use to determine which groups to make histograms for, we used TREATMENT). Use the function to recreate your original plot using compare_growth(tree_data, CIRC, ORIGINAL_TAG, YEAR, TREATMENT). Then use the function to create a similar plot showing growth faceted SPECIES, using SURVEY as the sample_column, and AXIS_1 as the measure by running compare_growth(tree_data, AXIS_1, ORIGINAL_TAG, SURVEY, SPECIES).
Expected outputs for Tree Growth: 1 2 3 4

Assignment submission & checklist

Assignment

Learning Objectives

Reading

Lecture Notes

Exercises

Size Estimates Vectorized (25 pts)

Size Estimates With Maximum (25 pts)

Size Estimates By Name Apply (25 pts)

Crown Volume Calculation (25 pts)

Tree Biomass Challenge (optional)

Tree Growth (optional)