Data Carpentry’s aim is to teach researchers basic concepts, skills, and tools for working with data so that they can get more done in less time, and with less pain. This workshop uses a tabular ecology dataset and teaches data cleaning, management, analysis and visualization.
There are no pre-requisites, and the materials assume no prior knowledge about the tools.
The data for this workshop are is the Portal Project Teaching Database available on FigShare, with a CC-BY license available for reuse.
The Portal Project Teaching Database is a simplified version of the Portal Project Database designed for teaching. It is a tabular dataset of observations of small mammals in a desert ecosystem in Arizona, USA, collected over more than 40 years. It provides a real world example of life-history, population, and ecological data, with sufficient complexity to teach many aspects of data analysis and management, but with many complexities removed to allow students to focus on the core ideas and skills being taught.
The workshop can be taught using R or Python as the base language.
Overview of the lessons:
There are two lessons in this section. The first is a spreadsheet lesson that teaches good data organization, and some data cleaning and quality control checking in a spreadsheet program.
The second lesson uses a program called OpenRefine to teach data cleaning and filtering, and to introduce the idea scripting(application programming interfaces).
These lessons includes a basic information to R or Python syntax, importing CSV data, subsetting and merging, data, and finishes with how to do plotting.
This lesson introduces the concept of a database using SQLite, how to structure data for easy database import, and how to import tabular data into SQLite. Then, it teaches basic queries, combining results and doing queries across multiple tables.
There are a number of other ecology lessons that are not part of the base workshop. Some of these are no longer taught, and some are only taught at extended workshops.