Basic for
loop
 Loops are the fundamental structure for repetition in programming
for
loops perform the same action for each item in a list of things
for (item in list_of_items) {
do_something(item)
}
 To see an example of this let’s calculate masses from volumes using a loop
 Need
print()
to display values inside a loop or function
volumes = c(1.6, 3, 8)
for (volume in volumes){
mass < 2.65 * volume ^ 0.9
print(mass)
}
 Code in the loop will run once for each value in volumes
 Everything between the curly brackets is executed each time through the loop
 Code takes the first value from
volumes
and assigns it tovolume
and does the calculation and prints it  Then it takes the second value from
volumes
and assigns it tovolume
and does the calculation and prints it  And so on
 So, this loop does the same exact thing as
volume < volumes[1]
mass < 2.65 * volume ^ 0.9
print(mass)
volume < volumes[2]
mass < 2.65 * volume ^ 0.9
print(mass)
volume < volumes[3]
mass < 2.65 * volume ^ 0.9
print(mass)
Do Tasks 1 & 2 in Basic For Loops.
Looping with an index & storing results
 R loops iterate over a series of values in a vector or other list like object
 When we use that value directly this is called looping by value
 But there is another way to loop, which is called looping by index
 Looping by index loops over a list of integer index values, typically starting at 1
 These integers are then used to access values in one or more vectors at the position inicated by the index
 If we modified our previous loop to use an index it would look like this
 We often use
i
to stand for “index” as the variable we update with each step through the loop
volumes = c(1.6, 3, 8)
for (i ...)
 We then create a vector of position values starting at 1 (for the first value) and ending with the length of the object we are looping over
volumes = c(1.6, 3, 8)
for (i in 1:3)
 We don’t want to have to know the length of the vector and it might change in the future, so we’ll look it up using the
length()
function
volumes = c(1.6, 3, 8)
for (i in 1:length(volumes)){
}
 Then inside the loop instead of doing the calculation on the index (which is just a number between 1 and 3 in our case)
 We use square brackets and the index to get the appropriate value out of our vector
volumes = c(1.6, 3, 8)
for (i in 1:length(volumes)){
mass < 2.65 * volumes[i] ^ 0.9
print(mass)
}
 This gives us the same result, but it’s more complicated to understand
 So why would we loop by index?

The advantage to looping by index is that it lets us do more complicated things
 One of the most common things we use this for are storing the results we calculated in the loop
 To do this we start by creating an empty object the same length as the results will be before the loop starts
 To store results in a vector we use the function
vector
to create an empty vector of the right length mode
is the type of data we are going to storelength
is the length of the vector
masses < vector(mode = "numeric", length = length(volumes))
masses
 Then add each result in the right position in this vector
 For each trip through the loop put the output into the empty vector at the
i
th position
for (i in 1:length(volumes)){
mass < 2.65 * volumes[i] ^ 0.9
masses[i] < mass
}
masses
 Walk through iteration in debugger
Do Tasks 34 in Basic For Loops.
End of 1 hour class
Looping over multiple values
 Looping with an index also allows us to access values from multiple vectors
as < c(2.65, 1.28, 3.29)
bs < c(0.9, 1.1, 1.2)
volumes = c(1.6, 3, 8)
masses < vector(mode="numeric", length=length(volumes))
for (i in 1:length(volumes)){
mass < as[i] * volumes[i] ^ bs[i]
masses[i] < mass
}
Do Task 5 in Basic For Loops.
Looping with functions
 It is common to combine loops with functions by calling one or more functions as a step in our loop
 For example, let’s take the nonvectorized version of our
est_mass
function that returns an estimated mass if thevolume > 5
andNA
if it’s not.
est_mass_max < function(volume, a, b){
if (volume < 5) {
mass < a * volume ^ b
} else {
mass < NA
}
return(mass)
}
 We can’t pass the vector to the function and get back a vector of results because of the
if
statements  So let’s loop over the values
 First we’ll create an empty vector to store the results
 And them loop by index, calling the function for each value of
volumes
masses < vector(mode="numeric", length=length(volumes))
for (i in 1:length(volumes)){
mass < est_mass_max(volumes[i], as[i], bs[i])
masses[i] < mass
}
 This is the for loop equivalent of an
mapply
statement
masses_apply < mapply(est_mass_max, volumes, as, bs)
Looping over data frames
 By default when R loops over a data frame it loops over the columns
data < data.frame(a = as, b = bs, volume = volumes)
for (i in data) {
print(i)
}
 To loop over rows, loop by index and subset
for (i in 1:nrow(data)) {
print(data[i, ])
}
 If we want to use a specific column
masses < vector(mode="numeric", length=length(volumes))
for (i in 1:nrow(data)) {
mass < est_mass_max(data[i, "volume"], data[i, "a"], data[i, "b"])
masses[i] < mass
}
Looping over files
 Repeat same actions on many similar files
 Let’s download some simulated satellite collar data
download.file("http://www.datacarpentry.org/semesterbiology/data/locations.zip",
"locations.zip")
unzip("locations.zip")
 Now we need to get the names of each of the files we want to loop over
 We do this using
list.files()
 If we run it without arguments it will give us the names of all files in the directory
list.files()
 But we just want the data files so we’ll add the optional
pattern
argument to only get the files that start with"locations"
data_files = list.files(pattern = "locations")
 Once we have this list we can loop over it count the number of observations in each file
 First create an empty vector to store those counts
num_files = length(data_files)
results < vector(mode = "integer", length = num_files)
 Then write our loop
for (i in 1:num_files){
filename < data_files[i]
data < read.csv(filename)
count < nrow(data)
results[i] < count
}
Do Task 1 of Multiplefile Analysis. Exercise uses different collar data
Storing loop results in a data frame
 We often want to calculate multiple pieces of information in a loop making it useful to store results in things other than vectors
 We can store them in a data frame instead by creating an empty data frame and storing the results in the
i
th row of the appropriate column  Associate the file name with the count
 Also store the minimum latitude
 Start by creating an empty data frame
 Use the
data.frame
function  Provide one argument for each column
 “Column Name” = “an empty vector of the correct type”
results < data.frame(file_name = vector(mode = "character", length = num_files),
count = vector(mode = "integer", length = num_files),
min_lat = vector(mode = "numeric", length = num_files))
 Now let’s modify our loop from last time
 Instead of storing
count
inresults[i]
we need to first specify thecount
column using the$
:results$count[i]
 We also want to store the filename, which is
data_files[i]
for (i in 1:n_files){
filename < data_files[i]
data < read.csv(filename)
count < nrow(data)
min_lat = min(data$lat)
results[i, "file_name"] < filename
results[i, "count"] < count
results[i, "min_lat"] < min_lat
}
Do Task 2 Multiplefile Analysis. Exercise uses different collar data
Subsetting Data (optional)
 Loops can subset in ways that are difficult with things like
group_by
 Look at some data on trees from the National Ecological Observatory Network
library(ggplot2)
library(dplyr)
neon_trees < read.csv('data/HARV_034subplt.csv')
ggplot(neon_trees, aes(x = easting, y = northing)) +
geom_point()
 Look at a northsouth gradient in number of trees
 Need to know number of trees in each band of y values
 Start by defining the size of the window we want to use
 Use the grid lines which are 2.5 m
window_size < 2.5
 Then figure out the edges for each window
south_edges < seq(4713095, 4713117.5, by = window_size)
north_edges < south_edges + window_size
 But we don’t want to go all the way to the far edge
south_edges < seq(4713095, 4713117.5  window_size, by = window_size)
north_edges < south_edges + window_size
 Set up an empty data frame to store the output
counts < vector(mode = "numeric", length = length(left_edges))
 Look over the left edges and subset the data occuring within each window
for (i in 1:length(south_edges)) {
data_in_window < filter(neon_trees, northing >= south_edges[i], northing < north_edges[i])
counts[i] < nrow(data_in_window)
}
counts
Nested Loops (optional)
 Sometimes need to loop over multiple things in a coordinate fashion
 Pass a window over some spatial data

Look at full spatial pattern not just eastwest gradient
 Basic nested loops work by putting one loop inside another one
for (i in 1:10) {
for (j in 1:5) {
print(paste("i = " , i, "; j = ", j))
}
}
 Loop over x and y coordinates to create boxes
 Need top and bottom edges
east_edges < seq(731752.5, 731772.5  window_size, by = window_size)
west_edges < east_edges + window_size
 Redefine out storage
output < matrix(nrow = length(south_edges), ncol = length(east_edges))
for (i in 1:length(south_edges)) {
for (j in 1:length(east_edges)) {
data_in_window < filter(neon_trees,
northing >= south_edges[i], northing < north_edges[i],
easting >= left_edges[j], easting < right_edges[j],)
output[i, j] < nrow(data_in_window)
}
}
output
Sequence along (optional)
seq_along()
generates a vector of numbers from 1 tolength(volumes)