This is an alpha lesson to teach Data Management with SQL for Social Scientists, We welcome and criticism, or error; and will take your feedback into account to improve both the presentation and the content.
This lesson is not currently under active maintenance. You are welcome to teach the lesson and contribute changes to the content, but you may have to wait longer than usual for any contributions to be processed. If you are interested in volunteering as a Maintainer on this lesson, please contact The Carpentries Curriculum Team or open an issue in this repository.
Databases are useful for both storing and using data effectively. Using a relational database serves several purposes.
- It keeps your data separate from your analysis. This means there’s no risk of accidentally changing data when you analyze it.
- If we get new data we can rerun a query to find all the data that meets certain criteria.
- It’s fast, even for large amounts of data.
- It improves quality control of data entry (type constraints and use of forms in Access, Filemaker, etc.)
- The concepts of relational database querying are core to understanding how to do similar things using programming languages such as R or Python.
This lesson will teach you what relational databases are, how you can load data into them and how you can query databases to extract just the information that you need.
We expect you to have learn a bit about the SAFI dataset in the spreadsheet and OpenRefine session. It is not necessary, but will greatly improve your ability to understand the power of SQL and when to use it versus another tool.