The Select Statement
OverviewTeaching: 15 min
Exercises: 10 minQuestions
What is SQL?
How can I return specific columns from a table?
How can I return specific rows from a table?Objectives
Explain how SQL is used to access relational database tables
Understand the difference between DDL and DML
Create simple SQL queries to return rows and columns from existing tables
Construct more complex logical expressions for use in WHERE clauses
Return sorted results from a query
Definition of SQL
SQL or Structured Query Language is an internatinal standard for manipulating data in a relational database. Each Relational Database system like Oracle, MySQL or SQLite implements its own variation of the standard.
Fortunateley for the types of commands and queries that we will want to write, all of the implementations are much in agreement. The SELECT queries we will be writing to access data in our SQLite database will execute un-altered in many of the other environments.
Essentially you only have to learn SQL once.
SQL and Relational database tables
The strength of SQL is that a single SQL statement or query can request data be returned from one or many of the tables in the database. You can essentially define the relationships between tables on-the-fly as part of your query statement. Relationships between tables are often included as part of the overall database design. In our situation we may be getting an assortment of tables from different sources so being able to imply the relatuionship as part of the query has definite advantages.
DDL and DML
DDL stands for Data Definition Language. It is the set of SQL comands used to create alter of delete database objects such as tables.
DML stands for Data Manipulation Language. For our purposes this is the SELECT command which is used to extract data items from one or more of the database tables.
Simple SQL queries using the Select statement
For the rest of this episode we will be looking at the SELECT statement.
To follow along, you should open the DB Browser application and connect to the SQL_SAFI database.
In SQL, querying data is performed by a SELECT statement. A select statement has 6 key components;
SELECT colnames FROM tablename WHERE conditions GROUP BY colnames HAVING conditions ORDER BY colnames
In practice very few queries will have all of these clauses in them simplifying many queries. On the other hand, conditions in the WHERE clause can be arbitarily complex and if you need to JOIN two or more tables together then more clauses (JOIN and ON) are needed.
All of the clause names above have been written in uppercase for clarity. SQL is not case sensitive. Neither do you need to write each clause on a new line, but it is often clearer to do so for all but the simplest of queries.
In this episode we will start with the very simple and work our way up to the more complex.
The simplest query is effectively one which returns the contents of the whole table
SELECT * FROM Farms;
It is better practice and generally more efficient to explicitly list the column names that you want returned.
SELECT Country, A06_province, A07_district, A08_ward, A09_village FROM Farms;
The ‘*’ character acts as a wildcard meaning all of the columns but you cannot use it as a general wildcard. So for example, the following is not valid.
SELECT A* FROM Farms;
If you run it you will get an error. When an error does occur you will see an error message displayed in the bottom pane.
In addition to limiting the columns returned by a query, you can also limit the rows returned.
The simplest case is to say how many rows are wanted using the
In the example below only the first ten rows of the result of the query will be returned.
This is useful if you just want to get a feel for what the data looks like.
SELECT * FROM Farms LIMIT 10;
Write a query which returns the first 5 rows from the Farms table with only the columns Q1,Q2,Q3,Q4 and numage.
SELECT Id , B16_years_liv , B17_parents_liv , B18_sp_parents_liv , B19_grand_liv , B20_sp_grand_liv FROM Farms LIMIT 5;
Because te query uses several columns (with longish names), for readability they have been set out on seperate lines. SQL takes of white space to you are free to arrange the text of the query as you like.
Usually you will want to restrict the rows returned based on some criteria. i.e. certain values or ranges within one or more columns.
In this example we are only interested in rows where the value in the B16_years_liv column is greater than 25
SELECT Id, B16_years_liv FROM Farms WHERE B16_years_liv > 25 ;
In addition to using the ‘>’ we can use many other operators such as <, <=, =, >=, <>
SELECT Id, B17_parents_liv FROM Farms WHERE B17_parents_liv = 'yes' ;
Using more complex logical expressions in the
We can also use the AND and OR keywords to build more complex selection criteria.
SELECT Id FROM Farms WHERE B17_parents_liv = 'yes' AND B18_sp_parents_liv = 'yes' AND B19_grand_liv = 'yes' AND B20_sp_grand_liv = 'yes' ;
Notice that the columns being used in the
WHERE clause do not need to returned as part of the
You can ensure the precedence of the operators by using brackets. Judicious use of brackets can also aid readability
SELECT Id FROM Farms WHERE (B17_parents_liv = 'yes' OR B18_sp_parents_liv = 'yes') AND B16_years_liv > 60 ;
From the above query, breakdown the
Whereclause so that each component can be tested indiviually. Make a note of how many rows are returned in each case.
To test each of the
SELECT Id FROM Farms WHERE B17_parents_liv = 'yes' ; SELECT Id FROM Farms WHERE B18_sp_parents_liv = 'yes' ; SELECT Id FROM Farms WHERE (B17_parents_liv = 'yes' OR B18_sp_parents_liv = 'yes') ; SELECT Id FROM Farms WHERE B16_years_liv > 60 ;
ORgenerally creatres a less restrictive condition and
ANDmakes a more restrictive condition.
The following query returns the rows where the value of B16_years_liv is in the range 51 to 59 inclusive.
SELECT Id, B16_years_liv FROM Farms WHERE B16_years_liv > 50 AND B16_years_liv < 60 ;
The same results could be obtained by using the BETWEEN or IN operators
SELECT Id, B16_years_liv FROM Farms WHERE B16_years_liv BETWEEN 51 AND 59 ;
SELECT Id, B16_years_liv FROM Farms WHERE B16_years_liv IN (51, 52, 53, 54, 55, 56, 57, 58, 59) ;
The list of values in brackets do not have to be contiguous or even in order.
Write a query using the Farms table which returns the columns Id, A09_village, A11_years_farm, B16_years_liv. We are only interested in rows where the A09_village value is either ‘God’ or ‘Ruaca’. Additionally we only want A11_years_farm values in the range 20 to 30 exclusive and B16_years_liv values strictly greater than 40. There are many ways of doing this, but try to use an inequality, an
INclause and a
SELECT Id, A09_village, A11_years_farm, B16_years_liv FROM Farms WHERE A09_village IN ('God', 'Ruaca') AND A11_years_farm BETWEEN 21 AND 29 AND B16_years_liv > 40 ;
If you want the results of your query to appear in a specific order, you can use the ORDER BY clause
SELECT Id, A09_village, A11_years_farm, B16_years_liv FROM Farms WHERE A09_village = 'God' ORDER BY A11_years_farm ;
By default the SQL assumes Ascending order. You can make this more explicit by using the ASC or DESC keywords.
SELECT Id, A09_village, A11_years_farm, B16_years_liv FROM Farms WHERE A09_village = 'God' ORDER BY A11_years_farm DESC ;
You can also order by multiple columns
SELECT Id, A09_village, A11_years_farm, B16_years_liv FROM Farms WHERE A09_village = 'God' ORDER BY A11_years_farm DESC , B16_years_liv ASC ;
Strictly speaking SQL is a standard, not a particular implementation
SQL implementation are sufficiently close that you only have to learn SQL once
The DDL constructs are used to create tables and other database objects
The DML constructs, typically the SELECT statement is used to retrieve data from one or more tables
The SELECT statement allows you to ‘slice’ and ‘dice’ the columns and rows of the dataset so that the query only returns the data of interest