This lesson is being piloted (Beta version)

Social Science Workshop Overview

Data Carpentry’s aim is to teach researchers basic concepts, skills, and tools for working with data so that they can get more done in less time, and with less pain. This workshop teaches data management and analysis for social science research including best practices for data organization in spreadsheets, reproducible data cleaning with OpenRefine, and data analysis and visualization in R. This curriculum is designed to be taught over two full days of instruction.

Materials for teaching data analysis and visualization in Python and extraction of information from relational databases using SQL are in development.

Getting Started

This curriculum assumes no prior experience with the tools covered in the workshop. Participants should bring their laptops and plan to participate actively.


This curriculum uses the same dataset throughout.

Studying African Farmer-led Irrigation (SAFI) dataset

The SAFI Project is a research project looking at farming and irrigation methods used by farmers in Tanzania and Mozambique. This dataset is composed of survey data relating to households and agriculture in Tanzania and Mozambique. The survey form was created using the ODK (Open Data Kit) software via an Excel spreadsheet. This is used to create a form which can be downloaded and displayed (and completed) on an Android smartphone. The results are then sent back to a central server. The server can be used to produce the collected data in both JSON and CSV formats. We will use a sample of the collected data in CSV format throughout this workshop. The data can be downloaded from Figshare.

More information on this dataset

This curriculum is currently available using R as the main programming language. Materials for teaching it with Python are in development.

Workshop Overview

Lesson Overview
Data Organization in Spreadsheets Learn how to organize tabular data, handle date formatting, carry out quality control and quality assurance and export data to use with downstream applications.
Data Cleaning with OpenRefine Explore, summarize, and clean tabular data reproducibly.
Data Analysis and Vizualization with R Import data into R, calculate summary statistics, and create publication-quality graphics.

Workshop Materials in Development

Lesson Overview
Data Analysis and Vizualization with Python Import data into Python, calculate summary statistics, and create publication-quality graphics.
Data Management with SQL Extract information from relational databases.