Genomics Workshop

Data Carpentry’s aim is to teach researchers basic concepts, skills, and tools for working with data so that they can get more done in less time, and with less pain. This workshop teaches data management and analysis for genomics research including: best practices for organization of bioinformatics projects and data, use of command line utilities, use of command line tools to analyze sequence quality and perform variant calling, and connecting to and using cloud computing. This workshop is designed to be taught over two full days of instruction.

Getting Started

This lesson assumes no prior experience with the tools covered in the workshop. However, learners are expected to have some familiarity with biological concepts, including nucleotide abbreviations and the concept of genomic variation within a population. Participants should bring their laptops and plan to participate actively.

To get started, follow the directions in the Setup tab to get access to the required software and data for this workshop.

Please note that workshop materials for working with Genomics data in R are under development and will become available in June 2018.


This workshop uses data from a long term evolution experiment published in 2012: Genomic analysis of a key innovation in an experimental Escherichia coli population by Blount ZD, Barrick JE, Davidson CJ, and Lenski RE. (doi: 10.1038/nature11514)

More information about these data will be presented in the first lesson of the workshop.

Workshop Overview

Lesson Overview
Project organization and management Learn how to structure your metadata, organize and document your genomics data and bioinformatics workflow, and access data on the NCBI sequence read archive (SRA) database.
Introduction to the command line Learn to navigate your file system, create, copy, move, and remove files and directories, and automate repetitive tasks using scripts and wildcards.
Data wrangling and processing Use command-line tools to perform quality control, align reads to a reference genome, and identify and visualize between-sample variation.
Introduction to cloud computing for genomics Learn how to work with Amazon AWS cloud computing and how to transfer data between your local computer and cloud resources.

Teaching Platform

This workshop is designed to be run on pre-imaged Amazon Web Services (AWS) instances. All the software and data used in the workshop are hosted on an Amazon Machine Image (AMI). If you want to run your own instance of the server used for this workshop, follow the directions in the Setup tab.


Setup Download files required for the lesson
00:00 Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.