This lesson is in the early stages of development (Alpha version)

Data Management with SQL for Social Scientists *alpha*: Glossary

Key Points

What is a relational database?
  • A relational database is data organised as a collection of related tables

  • SQL (Structured Query Language) is used to extract data from the tables. Either a single table or data spread across two or more related tables.

  • A schema, which describes the data in a table, has to be created before data can be added

  • The schema can be used to provide some data validation on input

Using DB Browser for SQLite
  • The DB Browser for SQLite application allows you to connect to an existing database or create a new database

  • When connected to a database you can create new tables

  • When connected to a database you can write and run SQL queries and view the results

  • You can save the results of a query to a file

The Select Statement
  • Strictly speaking SQL is a standard, not a particular implementation

  • SQL implementation are sufficiently close that you only have to learn SQL once

  • The DDL constructs are used to create tables and other database objects

  • The DML constructs, typically the SELECT statement is used to retrieve data from one or more tables

  • The SELECT statement allows you to ‘slice’ and ‘dice’ the columns and rows of the dataset so that the query only returns the data of interest

Missing Data
  • You should expect missing data

  • You need to know how missing data is being represented in your dataset

  • Database systems always represent what they consider to be missing data as NULL

  • You can explicitly test for NULL values in your data

  • You may need other tests for different representations of NULL

Creating New Columns
  • New result columns can be created using arithmetic operators or builtin functions

  • New columns have to be given names or Alias’

  • The Case coding structure can be used to create new columns

  • The new columns are only in the query results. The original table is not changed

Aggregations
  • Builtin functions can be used to produce a variety of summary statistics

  • The DISTINCT keyword can be used to find the unique set of values in a column or columns

  • Data in columns can be summarised by values using the GROUP BY clause

  • Summarised data can be filtered using the HAVING clause

Creating tables and views
  • Database tables can be created using the DDL command ‘Create Table’

  • They can be populated using the ‘INSERT INTO ‘ command

  • The SQLite plugin allows you to create a table and import data into it in one step

  • There are many options available to the ‘CREATE TABLE command which allows greater control over how or what data can be loaded into the table

  • A View can be treated just like a table in a query

  • A View does not contain data like a table does, only the instructions on how to get the data

The SQLite command line
  • SQLite databases can be created, managed and queried from the SQLite shell utility

  • You can run the shell interactively from the commandline, typing queries or dot cammands at the prompt

  • You can call the SQLite3 program and specify a database and a set of commands to run. This aids automation

Joins
  • Joins are used to combine data from two or more tables.

  • Tables to be joined must have a column in each which represent the same thing

  • There are several different types of joins

  • The Inner join is the most commonly use

  • You may have to use the other join types to discover missing data or gaps in your data

Using database tables in other environments
  • ODBC - Open DataBase Connector allows a database to be connected to a program or application

  • Each database system has its own ODBC connectors

  • Programs such as Excel allow you to use ODBC to get data from databases

  • Programming languages such as Python and R provide libraries which facilitate ODBC connections

Glossary

FIXME