Summary and Setup

Edit this page

Data Carpentry’s aim is to teach researchers basic concepts, skills, and tools for working with data so that they can get more done in less time, and with less pain. This workshop uses a tabular ecology dataset and teaches data cleaning, management, analysis and visualization.

Getting Started

Data Carpentry’s teaching is hands-on, so participants are encouraged to use their own computers to ensure the proper setup of tools for an efficient workflow. To most effectively use these materials, please make sure to download the data and install everything before working through this lesson.

This workshop assumes no prior experience with the tools covered in the workshop.

To get started, follow the directions in the Setup tab to get access to the required software and data for this workshop.


The data for this workshop are is the Portal Project Teaching Database available on FigShare, with a CC-BY license available for reuse.

The Portal Project Teaching Database is a simplified version of the Portal Project Database designed for teaching. It is a tabular dataset of observations of small mammals in a desert ecosystem in Arizona, USA, collected over more than 40 years. It provides a real world example of life-history, population, and ecological data, with sufficient complexity to teach many aspects of data analysis and management, but with many complexities removed to allow students to focus on the core ideas and skills being taught.

More information on this dataset

Workshop Overview

The workshop can be taught using R or Python as the base language. All workshops start with a lesson on organizing data effectively in spreadsheets, followed by a lesson on data cleaning with OpenRefine. Each workshop will then include either a lesson on R or a lesson on Python. Both the R and Python lessons focus on data import, exploratory data analysis, and visualization. Workshops may also include a lesson on working with data in a relational database using SQL, at the discretion of the instructors.

Lesson Overview
Data Organization in Spreadsheets for Ecologists Learn how to organize tabular data, handle date formatting, carry out quality control and quality assurance and export data to use with downstream applications.
Data Cleaning with OpenRefine for Ecologists Explore, summarize, and clean tabular data reproducibly.
Data Analysis and Visualization in R for Ecologists Import data into R, calculate summary statistics, and create publication-quality graphics.
Data Analysis and Visualization with Python for Ecologists Import data into Python, calculate summary statistics, and create publication-quality graphics.
Data Management with SQL for Ecologists Structure data for database import. Query data within a relational database.


This workshop is designed to be run on your laptop. First, you will need to download the data we use in the workshop. Then, you need to install some software. After following the instructions on this page, you should have everything you need to participate fully in the workshop!

Setup instructions for your workshop