Computational Projects

Learning Objectives

Following this assignment students should be able to:

properly structure a computational project

use good style

start to build more complex computational tasks

Reading

How to Google programming problems effectively

Lecture Notes

Exercises

Climate Space (40 pts)

Understanding how environmental factors influence species distributions can be aided by determining which areas of the available climate space a species currently occupies. You are interested in showing how much and what part of the available global temperature and precipitation range is occupied by some common tree species. Create three graphs, one each for Quercus alba, Picea glauca, and Ceiba pentandra. Each graph should show a scatterplot of the mean annual temperature and mean annual precipitation for points around the globe and highlight the values for 1000 locations of the plant species. Start by decomposing this exercise into small manageable pieces.

Here are some tips that will be helpful along the way:
- Climate data data is available from the WorldClim dataset. Using climate <- getData('worldclim', var ='bio', res = 10) (from the raster package) will download all of the bioclim variables. The two variables you need are bio1 (temperature) and bio12 (precipitation). If the website is down you can download a copy from the course site by downloading http://www.datacarpentry.org/semester-biology/data/wc10.zip and unzipping it into your home directory (/home/username on Mac and Linux, C:\Users\username\Documents on Windows).
- There are over 500,000 global data points which can make plotting slow. You can choose to plot a random subset of 10,000 points (e.g., using sample_n from the dplyr package) to limit the time it takes to generate.
- Choose good labels and make the points transparent to see their density.
- You might notice that the temperature values seem large. Storing decimal values uses more space than integers, so the WorldClim creators provide temperature values multiplied by 10. For example, 19.5 is stored as 195. Make sure to display the actual temperatures, not the raw values provided.
- Species’ occurrence data is available from GBIF using the spocc package. An example of how to get the data you need is available in the Species Occurrences Map exercise.
- To extract climate values for each occurrence from the climate data you will need a dataframe of occurrences that only only contains longitude and latitude columns.
- If the projections for WorldClim and the species occurrence data aren’t the same you will need a SpatialPointsDataframe.
- There are 19 bioclim variables that are stored together in a “raster stack”. You can either: 1) run extract on the full object returned by getData and then run data.frame on the result. This will produce a table with one row for each species location and one column for each bioclim variable; or 2) Get the data for a single bioclim variable using the $, e.g., climate$bio1, and run extract on this single raster.
Challenge (optional): If you want to challenge yourself trying making a single plot with all three species, either all on the same plot of split over three faceted subplots.
Expected outputs for Climate Space: 1 2 3
Megafaunal Extinction (60 pts)

There were a relatively large number of extinctions of mammalian species roughly 10,000 years ago. To help understand why these extinctions happened scientists are interested in understanding if there were differences in the size of the species that went extinct and those that did not. You are going to reproduce the three main figures from one of the major papers on this topic Lyons et al. 2004.

You will do this using a large dataset of mammalian body sizes that has data on the mass of recently extinct mammals as well as extant mammals (i.e., those that are still alive today).
1. Import the data into R. As with most real world data there are a some things about the dataset that you’ll need to identify and address during the import process. Print out the structure of the resulting data frame.
2. Create a plot showing histograms of masses for mammal species that are still present and those that went extinct during the pleistocene (extant and extinct in the status column). There should be one sub-plot for each continent and that sub-plot should show the histograms for both groups as a stacked histogram. To match the original analysis don’t include islands (Insular and Oceanic in the continent column) and or the continent labeled EA (because EA had no species that went extinct in the pleistocene). Scale the x-axis logarithmically and use 25 bins to roughly match the original figure. Use good axis labels.
3. The 2nd figure in the original paper looks in more detail at two orders, Xenarthra and Carnivora, which showed extinctions in North and South America. Create a figure similar to the one in Part 2, but that shows 4 sub-plots, one for each order on each of the two continents. Still scale the x-axis logarithmically, but use 19 bins to roughly match the original figure.
4. The 3rd figure in the original paper explores Australia as a case study. Australia is interesting because there is good data on both Pleistocene extinctions (extinct in the status column) and more modern extinctions occurring over the last 300 years (historical in the status column). Make single stacked histogram that compares the sizes of extinct, extant, and historical statuses. Scale the x-axis logarithmically and use 25 bins to roughly match the original figure. Use good axis labels.
5. (optional) Instead of excluding continent EA by name in your analysis (in part 2), modify your code to determine from the data which continents had species that went extinct in the pleistocene and only include those continents.
Expected outputs for Megafaunal Extinction: 1 2 3 4

Assignment submission & checklist

Assignment

Learning Objectives

Reading

Lecture Notes

Exercises

Climate Space (40 pts)

Megafaunal Extinction (60 pts)