From January 24-25 (Thursday and Friday) we held a Data Carpentry Genomics workshop at UC Davis, led by Sue McClatchy. This was a “sold out” workshop, which filled up about a day after the registration went live. This indicates to us that we should host it again!

Overall, it was a great workshop. The learning workflow included:

Data organization and management -> Introduction to the shell -> Downloading sequencing data files -> QC (fastqc and trimming) -> Assembly -> Variant calling

Data Carpentry Genomics lessons are great

I’m a big fan of the DC genomics lessons. While I do enjoy the traditional (git, Python, bash) Software Carpentry lessons, my field is genomics. After spending some time working as a Bioinformatician for a core sequencing facility before returning to graduate school at UC Davis, I can’t emphasize enough how important it for the overall smoothness of the data sequencing project to spend time planning one’s data and sample sheets with the idea in mind that others (including computers) will be reading them.

While it may be sound boring and intuitive, the beginning sections on Project Organization and Data Management are particularly crucial to the people on the other end receiving data sheets from customers submitting samples for sequencing, then having to demultiplex Illumina data to deliver sequencing files based on those spreadsheets.

Bioinformatics as a discipline is on the rise, however not all researchers are going to go to work for Google or embark on a research program solely based on computational science or bioinformatics. Researchers generating the sequencing data are the ones who know the most about their data. So, putting skills in the hands of the researchers, rather than relying on external data analysis collaborators who might not be as invested in the data or qualified to interpret the analysis, is critical to making discoveries that are proportional to the amount of data being collected. Despite the large need, data science skills are still rarely taught as a routine part of undergraduate biology curriculum..

From Tracy Teal’s New Year message to the newly-united Carpentries:

“….it’s so crucial to democratize data skills, scaling who has access to training and creating a community of practice that values not just the tools, but the people who use them and teach them.”

There is a need for bioinformatics training. Data Carpentry can help!

Workshop participant stats:

After the survey, >70% of participants agreed or strongly agreed that they could now:

All survey respondents felt comfortable learning in this workshop environment.

Write-in positive responses included:

Wishes included:

Here are a few of my pluses and wishes from the workshop.


Participants at this workshop subsequently have attended our weekly “Meet and Analyze Data” (MAD) sessions in the Center for Companion Animal Health, Bennet Conference Room on Thursdays, 3-5pm to work and ask questions. This was a great outcome.

The ‘Wrangling Genomics’ workflow, culminating in genome assembly and variant calling, used a Lenski data set (12 populations of E. coli propagated in the long-term evolution experiment (LTEE). Despite this being an old dataset with read lengths of 35 from the old Illumina GAIIx (Solexa) (no one will ever be getting new data with read lengths this short), the lesson still drives the point home that we can QC, assemble and call variants in a really short period of time during a workshop! Getting a workflow to work from beginning to end can be eye-opening for participants.

During the workshop, we caught a few typos in the lessons, so submitted pull requests and contributed to issues (Issue #56, Pull request #115, Issue #111, Issue #112, Issue #113, Pull request #114, Pull request #134). Got some helpful feedback from friendly global DC genomics community. Nice to meet/interact with people on github!




« Previous Next »

Dialogue & Discussion

Comments must follow to our Code of Conduct.