OpenRefine for Social Science Data: Setup


The data for this lesson is a part of the Data Carpentry Social Sciences workshop. It is a teaching version of the Studying African Farmer-Led Irrigation (SAFI) database. The SAFI dataset represents interviews of farmers in two countries in eastern sub-Saharan Africa (Mozambique and Tanzania). These interviews were conducted between November 2016 and June 2017 and probed household features (e.g. construction materials used, number of household members), agricultural practices (e.g. water usage), and assets (e.g. number and types of livestock).

The data used in this lesson is a subset of the teaching version that has been intentionally ‘messed up’ for this lesson.

Download the data file to your computer by clicking this link. (direct link:


For this lesson you will need OpenRefine (formerly Google Refine) and a web browser.

Note: this is a Java program that runs on your machine (not in the cloud). It runs inside your browser, but no web connection is needed.