### Learning Objectives

Following this assignment students should be able to:

• connect to a remote database and execute simple queries
• integrate database and R workflow
• export output data from R to database
• tidy data table with redundant fields or overfilled cells

• Topics

• Cleaning messy data using tidyr

tidyr

### Exercises

1. #### Tree Biomass (100 pts)

Estimating the total amount of biomass (the total mass of all individuals) in forests is important for understanding the global carbon budget and how the earth will respond to increases in carbon dioxide emissions. We can estimate the mass of a tree based on its diameter.

There are lots of equations for estimating the mass of a tree from its diameter, but one good option is the equation:

Mass = 0.124 * Diameter2.53

where `Mass` is measured in kg of dry above-ground biomass and `Diameter` is in cm DBH (Brown 1997).

We’re going to estimate the total tree biomass for trees in a 96 hectare area of the Western Ghats in India. The data needs to be tidied before all of the tree stems can be used for analysis. f If the `Macroplot_data_Rev.txt` is not already in your working directory download a copy.

1. Use `pivot_longer()` to create a longer data frame with one row for each measured stem. Use dplyr’s `filter` function to remove all of the girths that are zero. Store this longer data frame in a variable and also display it.
2. Write a function that takes a vector of tree diameters as an argument and
returns a vector of tree masses using the equation above. Test it using `mass_from_diameter(22)`.
3. Stems are measured in girth (i.e., circumference) rather than diameter. Write a function that takes a vector of circumferences as an argument and returns a vector of diameters (`diameter = circumference / pi`). Test it using `diameter_from_circumference(26)`.
4. Use the two functions you’ve written to and dplyr to add a `mass` column to your longer data frame. Store this data in a variable and display it.
5. Estimate the total biomass by summing the mass of all of the stems in dataset.
6. `separate()` the `SpCode` column into `GenusCode` and `SpEpCode` columns and then use `group_by` and `summarize` to the total biomass for each unique `GenusCode`.
7. Use ggplot to make a histogram of the `diameter` values. Make the x label `"Diameter [cm]` and the y label `"Number of Stems"`
Expected outputs for Tree Biomass: 1

Assignment submission & checklist