Genomics Workshop Overview: Setup

Overview

This workshop is designed to be run on pre-imaged Amazon Web Services (AWS) instances. All the software and data used in the workshop are hosted on an Amazon Machine Image (AMI).

Option A: Using the lessons with Amazon Web Services (AWS)

To run your own instance of the server used for this workshop, launch a t2.medium instance in the N. Virginia region with AMI ami-373ab74d “Data Carpentry Genomics release 1.0”, available under “Community AMIs” in the Amazon EC2 Management Console.

If you are taking a Genomics Data Carpentry workshop, instances will be set up for you. Follow the instructions on connecting to Data Carpentry Genomics Amazon instances to connect to the instance.

If you’re an instructor or maintainer or want to contribute to these lessons, please get in touch with us team@carpentries.org and we will start instances for you.

You can also start your own instance if you’re using these lessons for self-guided learning. Use the information on creating an Amazon instance. The cost of using this AMI for a few days, with the t2.medium instance type is very low.

Option B: Using the lessons on your local machine

While not recommended, it is possible to work through the lessons on your local machine (i.e. without using AWS). To do this, you will need to install all of the software used in the workshop and obtain a copy of the dataset. Instructions for doing this are below.

Data

The data used in this workshop is available on the Open Science Framework (OSF). Because this workshop works with real data, be aware that file sizes for the data are large.

https://osf.io/ycu8j/

This includes the data used in the exercises, as well as solutions to the exercises. These solutions can be useful if you’re working through the lessons, starting at a later module and need the solutions from previous exercises.

There are two directories:

You can also access the data by starting the Amazon AMI that has the data.

Software

Software Install Manual Available for Description
FastQC Link Link Linux, MacOS, Windows Quality control tool for high throughput sequence data.
Trimmomatic Link Link Linux, MacOS, Windows A flexible read trimming tool for Illumina NGS data.
BWA Link Link Linux, MacOS Mapping DNA sequences against reference genome.
SAMtools Link Link Linux, MacOS Utilities for manipulating alignments in the SAM format.
BCFtools Link Link Linux, MacOS Utilities for variant calling and manipulating VCFs and BCFs.
IGV Link Link Linux, MacOS, Windows Visualization and interactive exploration of large genomics datasets.

QuickStart Software Installation Instructions

These are the QuickStart installation instructions. They assume familiarity with the command line and with installation in general. As there are different operating systems and many different versions of operating systems and environments, these may not work on your computer. If an installation doesn’t work for you, please refer to the installation instructions for that software, listed in the table above.

FastQC

MacOS

To install FastQC, type:

$ brew install fastqc

or

$ conda install -y fastqc

FastQC Source Code Installation

If you prefer to install from source, follow the directions below:

$ cd ~/src
$ curl -O http://www.bioinformatics.babraham.ac.uk/projects/fastqc/fastqc_v0.11.5.zip
$ unzip fastqc_v0.11.5.zip

Link the fastqc executable to the ~/bin folder that you have already added to the path.

$ ln -sf ~/src/FastQC/fastqc ~/bin/fastqc

Due to what seems a packaging error the executable flag on the fastqc program is not set. We need to set it ourselves.

$ chmod +x ~/bin/fastqc

Test your installation by running:

$ fastqc -h

Trimmomatic

MacOS

brew install trimmomatic

or

conda install -y trimmomatic

Trimmomatic Source Code Installation

If you prefer to install from source, follow the directions below:

$ cd ~/src
$ curl -O http://www.usadellab.org/cms/uploads/supplementary/Trimmomatic/Trimmomatic-0.36.zip
$ unzip Trimmomatic-0.36.zip

The program can be invoked via:

$ java -jar ~/src/Trimmomatic-0.36/trimmomatic-0.36.jar

The ~/src/Trimmomatic-0.36/adapters/ directory contains Illumina specific adapter sequences.

$ ls ~/src/Trimmomatic-0.36/adapters/

Test your installation by running: (assuming things are installed in ~/src)

$ java -jar ~/src/Trimmomatic-0.36/trimmomatic-0.36.jar

Simplify the Invocation

To simplify the invocation you could also create a script in the ~/bin folder:

$ echo '#!/bin/bash' > ~/bin/trimmomatic
$ echo 'java -jar ~/src/Trimmomatic-0.36/trimmomatic-0.36.jar $@' >> ~/bin/trimmomatic
$ chmod +x ~/bin/trimmomatic

Test your script by running:

$ trimmomatic

BWA

MacOS

brew install bwa

or

conda install -y bwa

BWA Source Code Installation

If you prefer to install from source, follow the instructions below:

$ cd ~/src
$ curl -OL http://sourceforge.net/projects/bio-bwa/files/bwa-0.7.15.tar.bz2
$ tar jxvf bwa-0.7.15.tar.bz2
$ cd bwa-0.7.15
$ make
$ export PATH=~/src/bwa-0.7.15:$PATH

Test your installation by running:

$ bwa

SAMtools

MacOS

$ brew install samtools

or

$ conda install -y samtools

SAMtools Versions

SAMtools has changed the command line invocation (for the better). But this means that most of the tutorials on the web indicate an older and obsolete usage.

Use only SAMtools 1.3 or later.

SAMtools Source Code Installation

If you prefer to install from source, follow the instructions below:

$ cd ~/src
$ curl -OkL https://github.com/samtools/samtools/releases/download/1.3/samtools-1.3.tar.bz2
$ tar jxvf samtools-1.3.tar.bz2
$ cd samtools-1.3
$ make

Add directory to the path if necessary:

$ echo export `PATH=~/src/samtools-1.3:$PATH` >> ~/.bashrc
$ source ~/.bashrc

Test your installation by running:

$ samtools

BCFtools

MacOS

$ brew install bcftools

or

$ conda install bcftools

BCF tools Source Code Installation

If you prefer to install from source, follow the instructions below:

$ cd ~/src
$ curl -OkL https://github.com/samtools/bcftools/releases/download/1.5/bcftools-1.5.tar.bz2
$ tar jxvf bcftools-1.5.tar.bz2
$ cd bcftools-1.5
$ make

Add directory to the path if necessary:

$ echo export `PATH=~/src/bcftools-1.5:$PATH` >> ~/.bashrc
$ source ~/.bashrc

Test your installation by running:

$ bcftools

IGV