Data Wrangling and Processing for Genomics: Glossary

Key Points

Background and Metadata
  • It is important to record and understand your experiment’s metadata.

Assessing Read Quality
  • Quality encodings vary across sequencing platforms.

  • for loops let you perform the same set of operations on multiple files with a single command.

Trimming and Filtering
  • The options you set for the command-line tools you use are important!

  • Data cleaning is an essential step in a genomics workflow.

Variant Calling Workflow
  • Bioinformatic command line tools are collections of commands that can be used to carry out bioinformatic analyses.

  • To use most powerful bioinformatic tools, you will need to use the command line.

  • There are many different file formats for storing genomics data. It is important to understand what type of information is contained in each file, and how it was derived.

Automating a Variant Calling Workflow
  • We can combine multiple commands into a shell script to automate a workflow.

  • Use echo statements within your scripts to get an automated progress update.