Vectors
 A sequence of values with the same type
 Create using
c()
, which stands for “combine”
sites < c("a", "a", "b", "c")

str(sites)
 Slicing:
 Use
[]
 In general
[]
in R means, “give me a piece of something” sites[1]
sites[1:3]
1:3
makes a vector. So, this is the same as
sites[c(1, 2, 3)]
sites[c(4, 1, 3)]
 You can use a vector to get any subset or order you want
 Use
 Math functions:
length(sites)
density_ha < c(2.8, 3.2, 1.5, 3.8)
mean(density_ha)
max(density_ha)
min(density_ha)
sum(density_ha)
Do Bird Banding 14.
Null values
 So far we’ve worked with data with no missing values
 How many of you have missing values in your data?
density_ha < c(2.8, 3.2, 1.5, NA)
mean(density_ha)
 Why did we get
NA
? Hard to say what a calculation including
NA
should be  So most calculations return
NA
whenNA
is in the data
 Hard to say what a calculation including
 Can tell many functions to remove the
NA
before calculating
mean(density_ha, na.rm = TRUE)
Working with multiple vectors
 Vector math combines values in the same position
 Elementwise: operating on one element at a time
density_ha < c(2.8, 3.2, 1.5, 3.8)
area_ha < c(3, 5, 1.9, 2.7)
total_number < density_ha * area_ha
 Subsetting is done using
[]
, like slicing
area[sites == 'a']
==
means “equal to” in most languages.
Not
=
.=
is used for assignment.  Can also do “not equal to”
area[sites != 'a']
 Greater or less than
sites[area_ha > 3]
sites[area_ha >= 3]
sites[area_ha < 3]
 And we can subset a vector based on itself
sites[sites != 'a']
Data frames

A list of equal length vectors grouped together

data.frame()
surveys < data.frame(sites, density_ha, area_ha)
 Useful commands:
str(surveys)
length(surveys)
nrow(surveys)
,ncol(surveys)
 Subsetting:
 [row, column]
surveys[1, 2]
surveys[1:2, 2:3]
surveys[, 3]
surveys["area_ha"]
surveys[c("area_ha", "sites")]
surveys$area_ha
surveys[["area_ha"]]
Reading in external data
read.csv()
 Main argument is the location of the data  url or path on computer
 Go to
Datasets
page on site and copyShrub dimensions
url
shrub_data < read.csv('https://datacarpentry.org/semesterbiology/data/shrubdimensionslabeled.csv')
Factors
str(shrub_data)
 The
shrubID
column has typeFactor
 Special data type in R for categorical data
 Useful for statistics, but can mess up some aspects of computation
 Can eliminate during imports with
stringsAsFactors
shrub_data < read.csv('https://datacarpentry.org/semesterbiology/data/shrubdimensionslabeled.csv', stringsAsFactors = FALSE)
str(shrub_data)
Start Shrub Volume Data Frame, but just use the url instead of downloading the file.