Iteration without Loops in R

Repetition in R

Computers are great at doing things repeatedly
We’ve learned to use functions to find mass for one volume

est_mass <- function(volume){
  mass <- 2.65 * volume^0.9
  return(mass)
}

est_mass(1.6)

This makes it easier to find mass for other volumes

est_mass(5.6)
est_mass(3.1)

But, this is tedious, error-prone, and impossible for large numbers of volumes
There are multiple ways to do something repeatedly in R and we’ll talk about all of them over the next several lessons
These include
- Vectorize - where we right functions that take a vector of values, do elementwise calculations, and return a vector of results
- Using Apply/Map - which takes a function and applies it to each item in a list of items
- Combining our own functions with dplyr - which we can do using both vectorized and non-vectorized functions
- Loops - which provide us with complete control to perform of any kind of repetition we want

Vectorize

Write functions that take a vector of values, do elementwise calculations, and return a vector of the results
Any function that only uses calculations that are vectorized
E.g., vector math

c(1, 2, 3) * 2

Our current function already works on a vector

est_mass <- function(volume){
  mass <- 2.65 * volume ^ 0.9
  return(mass)
}

volumes = c(1.6, 5.6, 3.1)
est_mass(volumes)

Many functions in R are vectorized which means that we can often repeated things using only this vectorization
We can also use use vectorized functions with data frames by sending the individual columns of a data frame to the function

data <- data.frame(volumes, plant_id = c(1, 2, 3))
data
est_mass(data$volumes)
est_mass(data[['volumes']])

Do Size Estimates Vectorized 1.

Multiple Arguments

Let’s modify our function to take the coefficient (the value that is currently set as 2.65) as an argument
We’ll call it a

est_mass_coef <- function(volume, a){
  mass <- a * volume ^ 0.9
  return(mass)
}

est_mass_coef(volumes, 2.56)

Because we only provided a single value of a, that value gets used for every value of volume when doing the calculation
But multiplication is also vectorized for two vectors

c(1, 2, 3) * c(1, 2, 3)

So we can also pass the function a vector of values for a

as <- c(2.56, 1, 3.2)
est_mass_coef(volumes, as)

Integrating with `dplyr`

We can also integrate our vectorized functions with dplyr
This lets us use them to repeat calculations for each row in a data frame
Let’s convert our volume and as vectors into a data frame

plant_data = data.frame(volume = volumes, a = as)

To apply vectorized functions to each row in a table we can use mutate

plant_data |>
  mutate(masses = est_mass_coef(volume, a))

This is just like we’ve seen using other R functions, but it works with the vectorized functions we write as well

Do Size Estimates Vectorized 2-3.

Apply/Map functions

Not all functions in R are vectorized
So we need a way to repeatedly run these non-vectorized functions
Use apply() and map() functions
We’ll learn the apply family of functions since they are very common, but map is a very similiar tidyverse option
These functions take two arguments
The first is a vector of values that we want to run a function on
The second is the function that we want to run on each value in the vector
The apply functions then “apply” the function each item in the vector
Return a list of the same size
Doesn’t require calculations to work on vectors
Let’s look at this with a version of our function that only calculates mass for volumes less than a maximum size

est_mass_max <- function(volume){
  if (volume < 5) {
    mass <- 2.56 * volume ^ 0.9
  } else {
    mass <- NA
  }
  return(mass)
}

If we try to run this function on our volume it won’t work because the if statements are designed for a single value, not a vector

est_mass_max(volumes)

Instead we can use one of the apply() functions

sapply & lapply

We’ll start with sapply()
This function take two arguments
The first is a single vector
The second is the function that we want to “apply” to each element of that vector (or list)
So if we use our volumes vector and our new est_mass() function
sapply() will run the est_mass function on each value in volumes, one value at a time

sapply(volumes, est_mass_max)

Under the surface this is that same as running our est_mass() function on the first item in volumes
Then running it on the second value in volumes and then the third value in volumes
And the storing those values together in a vector

c(est_mass_max(volumes[1]), est_mass_max(volumes[2]), est_mass_max(volumes[3]))

This lets us do the same action on many things with single line of code
Handful of similar functions in apply() family
Differ depending on type of input and output data
The s in sapply stands for “simplify”
It will try to return the simplest object possible, in this case a vector
lapply returns a “list”

lapply(volumes, est_mass_max)

This is a more complicated, but also more flexible, data structure that we don’t see much in this class, but it’s useful to know the difference between lapply and sapply.
We can store anything in a list, so if you had a function that made a bunch of graphs or a bunch of data frames lapply would let you work with them
Likewise both of these functions can also take a list as input allowing you to accomplish more complicated things

Do Size Estimates With Maximum.

Apply with multiple arguments

mapply() for functions with multiple arguments
Vegetation type specific equations

est_mass_type <- function(volume, veg_type){
  if (veg_type == "shrub"){
    mass <- 2.65 * volume^0.9
  } else {
    mass <- NA
  }
  return(mass)
}

est_mass_type(1.6, "shrub")
plant_types = c("shrub", "tree", "shrub")
est_mass_type(volumes, plant_types) # Error

Doesn’t vectorize, due to conditionals
Use an apply function instead
mapply() because “multiple” inputs

mapply(est_mass_type, volumes, plant_types)

First argument is function
All other arguments are arguments for the function

Do Task 1 in Size Estimates By Name Apply.

map functions from purrr package are similar to apply

Integrating with dplyr

Let’s update our plant_data data frame to include our plant types

plant_type_data = data.frame(volume = volumes,
                        plant_type = plant_types)

The basic integration with dplyr we used for vectorized functions won’t work with non-vectorized functions

plant_type_data |>
  mutate(masses = est_mass_type(volume, plant_type))

Error because est_mass_type isn’t vectorized
By default dplyr runs this function by converting the individual columns to vectors and running the function on those vectors
Just like when we tried to run the function on the vectors
To get around this we add the function rowwise to our dplyr pipeline
This tells dplyr to work with the data one row at a time, like an apply function

plant_data |>
  rowwise() |>
  mutate(masses = est_mass_type(volumes, plant_types))

Do Task 2 in Size Estimates By Name Apply.

One result per group

We can also combine functions with group_by and summarize to repeat a calculation for each group
These functions need to take a vector as input and return a single value as output
So, let’s write a function that calculates the biomass (the sum of the individual masses) for each plant type

get_biomass <- function(volumes){
  masses <- est_mass(volumes)
  biomass <- sum(masses)
  return(biomass)
}

This function takes a vector of volumes as input and returns a single value, the biomass
We can then group our data by plant_types
And summarize by our function to calculate the biomass for each group

plant_data |>
  group_by(plant_types) |>
  summarize(biomass = get_biomass(volumes))

Other apply functions (optional)

There are a few other apply functions
vapply() works like sapply(), but you have to tell it what type the returned vector will be
tapply() works like sapply(), but lets you provide a single grouping field (kind of like group_by() in dplyr)
apply() works on multi-dimensional data
Set MARGIN to tell it which dimension to calculate along
1 for rows
2 for columns

counts = data.frame(sp1 = c(5, 4, 7, 6), sp2 = c(6, 2, 6, 9), sp3 = c(8, 16, 1, 0))
counts
apply(X = counts, MARGIN = 1, FUN = sum)
apply(X = counts, MARGIN = 2, FUN = sum)

Notes

Repetition in R

Vectorize

Multiple Arguments

Integrating with dplyr

Apply/Map functions

sapply & lapply

Apply with multiple arguments

Integrating with dplyr

One result per group

Other apply functions (optional)

Integrating with `dplyr`