What is a relational database?
|
A relational database is data organised as a collection of related tables
SQL (Structured Query Language) is used to extract data from the tables. Either a single table or data spread across two or more related tables.
A schema, which describes the data in a table, has to be created before data can be added
The schema can be used to provide some data validation on input
|
Using DB Browser for SQLite
|
The DB Browser for SQLite application allows you to connect to an existing database or create a new database
When connected to a database you can create new tables
When connected to a database you can write and run SQL queries and view the results
You can save the results of a query to a file
|
The Select Statement
|
Strictly speaking SQL is a standard, not a particular implementation
SQL implementation are sufficiently close that you only have to learn SQL once
The DDL constructs are used to create tables and other database objects
The DML constructs, typically the SELECT statement is used to retrieve data from one or more tables
The SELECT statement allows you to ‘slice’ and ‘dice’ the columns and rows of the dataset so that the query only returns the data of interest
|
Missing Data
|
You should expect missing data
You need to know how missing data is being represented in your dataset
Database systems always represent what they consider to be missing data as NULL
You can explicitly test for NULL values in your data
You may need other tests for different representations of NULL
|
Creating New Columns
|
New result columns can be created using arithmetic operators or builtin functions
New columns have to be given names or Alias’
The Case coding structure can be used to create new columns
The new columns are only in the query results. The original table is not changed
|
Aggregations
|
Builtin functions can be used to produce a variety of summary statistics
The DISTINCT keyword can be used to find the unique set of values in a column or columns
Data in columns can be summarised by values using the GROUP BY clause
Summarised data can be filtered using the HAVING clause
|
Creating tables and views
|
Database tables can be created using the DDL command ‘Create Table’
They can be populated using the ‘INSERT INTO ‘ command
The SQLite plugin allows you to create a table and import data into it in one step
There are many options available to the ‘CREATE TABLE command which allows greater control over how or what data can be loaded into the table
A View can be treated just like a table in a query
A View does not contain data like a table does, only the instructions on how to get the data
|
The SQLite command line
|
SQLite databases can be created, managed and queried from the SQLite shell utility
You can run the shell interactively from the commandline, typing queries or dot cammands at the prompt
You can call the SQLite3 program and specify a database and a set of commands to run. This aids automation
|
Joins
|
Joins are used to combine data from two or more tables.
Tables to be joined must have a column in each which represent the same thing
There are several different types of joins
The Inner join is the most commonly use
You may have to use the other join types to discover missing data or gaps in your data
|
Using database tables in other environments
|
ODBC - Open DataBase Connector allows a database to be connected to a program or application
Each database system has its own ODBC connectors
Programs such as Excel allow you to use ODBC to get data from databases
Programming languages such as Python and R provide libraries which facilitate ODBC connections
|