Data Wrangling and Processing for Genomics: Glossary

Key Points

Assessing Read Quality
  • Quality encodings vary across sequencing platforms.

  • for loops let you perform the same set of operations on multiple files with a single command.

Trimming and Filtering
  • The options you set for the command-line tools you use are important!

  • Data cleaning is an essential step in a genomics workflow.

Variant Calling Workflow
  • Bioinformatics command line tools are collections of commands that can be used to carry out bioinformatics analyses.

  • To use most powerful bioinformatics tools, you’ll need to use the command line.

  • There are many different file formats for storing genomics data. It’s important to understand these file formats and know how to convert among them.

Automating a Variant Calling Workflow
  • We can combine multiple commands into a shell script to automate a workflow.

  • Use echo statements within your scripts to get an automated progress update.