Home> Data Carpentry Lessons

Data Carpentry Lessons

We facilitate and develop lessons for Data Carpentry workshops. These lessons are distributed under the CC-BY license and are free for re-use or adaptation, with attribution. We’ve had people use the lessons in courses, to build new lessons, or use them for self-guided learning.

Data Carpentry workshops are domain-specific, so that we are teaching researchers the skills most relevant to their domain and using examples from their type of work. Therefore, we have several types of workshops, and the curriculum is organised by domain.

Curriculum Advisors are part of a team that provides the oversight, vision, and leadership for a particular set of lessons. More information about the role of the Curriculum Advisory Committee can be found in the Carpentries Handbook.

Astronomy

The Foundations of Astronomical Data Science curriculum covers a range of core concepts necessary to efficiently study the ever-growing datasets developed in modern astronomy. This curriculum teaches learners to perform database operations (SQL queries, joins, filtering) and to create publication-quality data visualisations. This curriculum assumes some prior knowledge of Python and exposure to the Bash shell, equivalent to that taught in a Software Carpentry workshop.

Lessons

LessonSiteRepositoryReferenceInstructor NotesMaintainers
Foundations of Astronomical Data ScienceAycha Tammour, Michel M. Nzikou

Ecology

This workshop uses a tabular ecology dataset from the Portal Project Teaching Database and teaches data cleaning, management, analysis, and visualization. There are no pre-requisites, and the materials assume no prior knowledge about the tools. We use a single dataset throughout the workshop to model the data management and analysis workflow that a researcher would use.

The Ecology workshop can be taught using R or Python as the base language.

Lessons in English

LessonSiteRepositoryReferenceInstructor NotesMaintainers
Ecology Workshop Overview
Data Organization in Spreadsheets for EcologistsMary Tuttle, Urmi Poddar
Data Cleaning with OpenRefine for EcologistsLuis J Villanueva, Tajudeen Akanbi Akinosho
Data Management with SQL for EcologistsJames Foster
Data Analysis and Visualization in R for EcologistsCarolyn Koehn, Elizabeth Stregger, Hugo Gruson
Data Analysis and Visualization in Python for EcologistsGuppy Stott, Lilian Huang, Jose Niño Muriel

Lecciones en español

LecciónSitio webRepositorioReferenciaGuía del instructorMantenedor(es)
Análisis y visualización de datos usando Python (Beta)Heladia Salgado, Irene Ramos Pérez, Julieta Millan, Vini Salazar

Genomics

The focus of this workshop is on working with genomics data, and data management and analysis for genomics research, including best practices for organisation of bioinformatics projects and data, use of command line utilities, use of command line tools to analyze sequence quality and perform variant calling, and connecting to and using cloud computing.

More information about hosting and teaching a Genomics workshop can be found on our FAQ page.

Interested in teaching these materials? We have an onboarding video and accompanying slides available to prepare Instructors to teach these lessons. After watching this video, please contact team@carpentries.org so that we can record your status as an onboarded Instructor. Instructors who have completed onboarding will be given priority status for teaching at Centrally-Organised Data Carpentry Genomics workshops.

Please note that one of the lessons from the Genomics workshop material—“Intro to R and RStudio for Genomics”—is in “beta” development. This lesson is available for review and for informal teaching experiences, but is not yet part of The Carpentries’ official lesson offerings yet.

Lessons

LessonSiteRepositoryReferenceInstructor NotesMaintainers
Genomics Workshop OverviewAnuj Guruacharya
Project Organization and Management for GenomicsAziz Khan, Larisa Soto
Introduction to the Command Line for GenomicsAlison Meynert, Aziz Khan
Data Wrangling and Processing for GenomicsAida Miró-Herrans, Josh Herr
Introduction to Cloud Computing for GenomicsPeter Li, Renee Ng

Lessons in Development

LessonSiteRepositoryReferenceInstructor NotesMaintainers
Intro to R and RStudio for Genomics *beta*Jason Williams, Naupaka Zimmerman, Yuka Takemon

Geospatial

This workshop is co-developed with the National Ecological Observatory Network (NEON). It focuses on working with geospatial data - managing and understanding spatial data formats, understanding coordinate reference systems, and working with raster and vector data in R for analysis and visualization.

Join the geospatial curriculum email list to get updates and be involved in conversations about this curriculum.

Interested in teaching these materials? We have an onboarding video and accompanying slides available to prepare Instructors to teach these lessons. After watching this video, please contact team@carpentries.org so that we can record your status as an onboarded Instructor. Instructors who have completed onboarding will be given priority status for teaching at Centrally-Organised Data Carpentry Geospatial workshops.

LessonSiteRepositoryReferenceInstructor NotesMaintainers
Geospatial Workshop Overview
Introduction to Geospatial ConceptsRohit Goswami
Introduction to R for Geospatial DataAlber Sánchez, Cooper Kimball-Rhines, Johanna Bayer, Kristi Liu
Introduction to Geospatial Raster and Vector Data with RIvo Arrey, Jon Jablonski

Image processing

This workshop uses Python and a variety of example images to teach the foundational concepts of image processing, and the skills needed to programmatically extract information from image data. The current version of the curriculum was developed from material originally created by Dr. Tessa Durham Brooks and Dr. Mark Meysenburg at Doane College, Nebraska, USA, with support from an NSF iUSE grant. Further development of the curriculum was supported by a grant from the Sloan Foundation.

Join the image processing curriculum email list and/or the dc-image-processing channel on The Carpentries Slack workspace to get updates and be involved in conversations about this curriculum.

LessonSiteRepositoryReferenceInstructor NotesMaintainers
Image Processing with PythonKimberly Meechan, Marco Dalla Vecchia, Toby Hodges, Ulf Schiller

Social Science

This workshop uses a tabular interview dataset from the SAFI Teaching Database and teaches data cleaning, management, analysis and visualization. There are no pre-requisites, and the materials assume no prior knowledge about the tools. We use a single dataset throughout the workshop to model the data management and analysis workflow that a researcher would use.

The Social Sciences workshop can be taught using R as the base language. Interested in teaching these materials? We have an onboarding video and accompanying slides available to prepare Instructors to teach these lessons. After watching this video, please contact team@carpentries.org so that we can record your status as an onboarded Instructor. Instructors who have completed onboarding will be given priority status for teaching at Centrally-Organised Data Carpentry Social Sciences workshops.

Please note that workshop materials for working with Social Science data in Python and SQL are under development.

Lessons

LessonSiteRepositoryReferenceInstructor NotesMaintainers
Social Science Workshop OverviewJesse Sadler, Johanna Bayer
Data Organization in Spreadsheets for Social ScientistsAllie Tatarian, Jose Niño Muriel
Data Cleaning with OpenRefine for Social ScientistsBen Companjen, Marijane White
Data Analysis and Visualization with R for Social ScientistsJesse Sadler, Juan Fung

Lessons in development

LessonSiteRepositoryReferenceInstructor NotesMaintainers
Data Analysis and Visualization with Python for Social Scientists *alpha*
Data Management with SQL for Social Scientists *alpha*