Reference

Last updated on 2023-10-19 | Edit this page

Glossary


The definitions below are modified from the Carpentries Glosario (CC-BY-4.0)

aggreation
To combine many values into one, e.g., by summing a set of numbers.
field
A component of a record containing a single value. Every record in a database table has the same fields.
filter
To choose a set of records (i.e., rows of a table) based on the values they contain.
full join
A join that returns all rows and all columns from two tables A and B. Where the keys of A and B match, values are combined; where they do not, missing values from either table are filled with null, NA, or some other missing value signifier.
group
To divide data into subsets according to a set of criteria while leaving records in a single structure.
inner join
A join that returns the combination of rows from two tables, A and B, whose keys exist in both tables.
join
One of several operations that combine values from two tables.
key
A field or combination of fields whose value(s) uniquely identify a record within a table or dataset. Keys are often used to select specific records and in joins.
left join
A join that combines data from two tables, A and B, where keys in table A match keys in table B, fields are concatenated. Where a key in table A does not match a key in table B, columns from table B are filled with null, NA, or some other missing value. Keys from table B that do not match keys from table A are excluded for the result.
missing value
A special value such as null or NA used to indicate the absence of data. Missing values can signal that data was not collected or that the data did not exist in the first place (e.g., the middle name of someone who does not have one).
null
A special value used to represent a missing object.
record
A group of related values that are stored together. A record may be represented as a tuple or as a row in a table; in the latter case, every record in the table has the same fields.
relational_database
A database that organizes information into tables, each of which has a fixed set of named fields (shown as columns) and a variable number of records (shown as rows).
right join
A join that combines data from two tables, A and B. Where keys in table A match keys in table B, fields are concatenated. Where a key in table B does *not* match a key in table A, columns from table A are filled with null, NA, or some other missing value signifier. Keys from table A that do not exist in table B are dropped.
select
To choose entire columns or rows from a table by name or location.
SQL
The language used for writing queries for a relational database. The term is an acronym for Structured Query Language.
table
A set of records in a relational database or observations in a data frame. Tables are usually displayed as rows (each of which represents one record or observation) and columns (each of which represents a field or variable.)

Commands


See this cheat sheet for an list of the commands covered in this lesson.

Keywords

Keyword Definition Example Description
SELECT Select data from database or table SELECT * Selects the entire dataset
SELECT column1 Selects a particular column
SELECT 1 + 2 Performs a calculation
FROM Indicates the table from which the data is selected or deleted SELECT year
FROM surveys
Query will display the desired column from the table
WHERE Filter the result set so that only the values satisfying the condition are included SELECT *
FROM surveys
WHERE year == 1990
Query will display all values from the table for which the year is 1990
LIMIT Retrieves the number of records from the table up to the limit value SELECT *
FROM surveys
LIMIT 5
Query will will return only the first 5 rows from the table
DISTINCT Select distinct values SELECT DISTINCT year
FROM surveys
Query will display the distinct years present on the table
AS Used as an alias to rename the column or table SELECT 1 + 2 AS calc Column will be renamed to “calc”
GROUP BY Groups the result set SELECT MAX(weight)
FROM surveys
GROUP BY year
Query will display the max weight per year
HAVING Used to filter grouped rows when using aggregate functions SELECT MAX(weight)
FROM surveys
GROUP BY year HAVING MAX(weight) > 100
Filter the results by the years that have a maximum weight greater than 100g
JOIN Joins tables SELECT *
FROM surveys
JOIN species
ON surveys.species_id = species.species_id
Query will display all the columns from both tables where the condition is met