Data Carpentry’s aim is to teach researchers basic concepts, skills, and tools for working with data so that they can get more done in less time, and with less pain. This workshop teaches data management and analysis for social science research including best practices for data organization in spreadsheets, reproducible data cleaning with OpenRefine, and data analysis and visualization in R. This curriculum is designed to be taught over two full days of instruction.
Materials for teaching data analysis and visualization in Python and extraction of information from relational databases using SQL are in development.
Interested in teaching these materials? We have an onboarding video and accompanying slides available to prepare Instructors to teach these lessons. After watching this video, please contact email@example.com so that we can record your status as an onboarded Instructor. Instructors who have completed onboarding will be given priority status for teaching at centrally-organized Data Carpentry Social Sciences workshops.
This curriculum assumes no prior experience with the tools covered in the workshop. Participants should bring their laptops and plan to participate actively.
This curriculum uses the same dataset throughout.
Studying African Farmer-led Irrigation (SAFI) dataset
The SAFI Project is a research project looking at farming and irrigation methods used by farmers in Tanzania and Mozambique. This dataset is composed of survey data relating to households and agriculture in Tanzania and Mozambique. The survey form was created using the ODK (Open Data Kit) software via an Excel spreadsheet. This is used to create a form which can be downloaded and displayed (and completed) on an Android smartphone. The results are then sent back to a central server. The server can be used to produce the collected data in both JSON and CSV formats. We will use a sample of the collected data in CSV format throughout this workshop. The data can be downloaded from Figshare.
This curriculum is currently available using R as the main programming language. Materials for teaching it with Python are in development.
|Data Organization in Spreadsheets||Learn how to organize tabular data, handle date formatting, carry out quality control and quality assurance and export data to use with downstream applications.|
|Data Cleaning with OpenRefine||Explore, summarize, and clean tabular data reproducibly.|
|Data Analysis and Visualisation with R||Import data into R, calculate summary statistics, and create publication-quality graphics.|
Workshop Materials in Development
|Data Analysis and Visualisation with Python||Import data into Python, calculate summary statistics, and create publication-quality graphics.|
|Data Management with SQL||Extract information from relational databases.|