Summary and Setup

Edit this page

Data Carpentry’s aim is to teach researchers basic concepts, skills, and tools for working with data so that they can get more done in less time, and with less pain. This workshop teaches data management and analysis for social science research including best practices for data organization in spreadsheets, reproducible data cleaning with OpenRefine, and data analysis and visualization in R. This curriculum is designed to be taught over two full days of instruction.

Materials for teaching data analysis and visualization in Python and extraction of information from relational databases using SQL are in development.

Interested in teaching these materials? We have an onboarding video and accompanying slides available to prepare Instructors to teach these lessons. After watching this video, please contact so that we can record your status as an onboarded Instructor. Instructors who have completed onboarding will be given priority status for teaching at centrally-organized Data Carpentry Social Sciences workshops.

Getting Started

Data Carpentry’s teaching is hands-on, so participants are encouraged to use their own computers to ensure the proper setup of tools for an efficient workflow. To most effectively use these materials, please make sure to download the data and install everything before working through this lesson.

This workshop assumes no prior experience with the tools covered in the workshop.

To get started, follow the directions in the Setup tab to get access to the required software and data for this workshop.


The data for this workshop are in the SAFI Survey Results Project available on FigShare, with a CC-BY license available for reuse.

The SAFI Project is a research project looking at farming and irrigation methods used by farmers in Tanzania and Mozambique. This dataset is composed of survey data relating to households and agriculture in Tanzania and Mozambique. The survey form was created using the ODK (Open Data Kit) software via an Excel spreadsheet. This is used to create a form which can be downloaded and displayed (and completed) on an Android smartphone. The results are then sent back to a central server. The server can be used to produce the collected data in both JSON and CSV formats. We will use a sample of the collected data in CSV format throughout this workshop.

More information on this dataset

This curriculum is currently available using R as the main programming language. Materials for teaching it with Python are in development.

Workshop Overview

Lesson Overview
Data Organization in Spreadsheets Learn how to organize tabular data, handle date formatting, carry out quality control and quality assurance and export data to use with downstream applications.
Data Cleaning with OpenRefine Explore, summarize, and clean tabular data reproducibly.
Data Analysis and Visualisation with R Import data into R, calculate summary statistics, and create publication-quality graphics.

Workshop Materials in Development

Lesson Overview
Data Analysis and Visualisation with Python Import data into Python, calculate summary statistics, and create publication-quality graphics.
Data Management with SQL Extract information from relational databases.


This workshop is designed to be run on your laptop. First, you will need to download the data we use in the workshop. Then, you need to install some software. After following the instructions on this page, you should have everything you need to participate fully in the workshop!

Setup instructions for your workshop