File Organization: Organization

Overview

Teaching: 30 min
Exercises: 10 min
Questions
  • What are the common file organization errors?

  • What are best practices for file organization?

Objectives
  • Highlight common SNAFUs

A place for everything, everything in its place - Benjamin Franklin


plot of chunk unnamed-chunk-1

plot of chunk unnamed-chunk-2


Data analysis workflow

plot of chunk unnamed-chunk-3


Face it…


Mighty weapon


Organizing your data analysis workflow

Raw data $\rightarrow$ data

Pick a strategy, any strategy, just pick one and stick to it!

plot of chunk unnamed-chunk-4


Data $\rightarrow$ results

Pick a strategy, any strategy, just pick one and stick to it!

plot of chunk unnamed-chunk-5


Data $\rightarrow$ results

Pick a strategy, any strategy, just pick one and stick to it!

plot of chunk unnamed-chunk-6


A real (and imperfect!) example

plot of chunk unnamed-chunk-7


Data

Ready to analyze data:

plot of chunk unnamed-chunk-8


Raw data:

plot of chunk unnamed-chunk-9


Analysis and figures

R scripts + the Markdown files from “Compile Notebook”:

plot of chunk unnamed-chunk-10


The figures created in those R scripts and linked in those Markdown files:

plot of chunk unnamed-chunk-11


Scripts

Linear progression of R scripts, and Makefile to run the entire analysis:

plot of chunk unnamed-chunk-12


Results

Tab-delimited files with one row per gene of parameter estimates, test statistics, etc.:

plot of chunk unnamed-chunk-13


Expository files

Files to help collaborators understand the model we fit: some markdown docs, a Keynote presentation, Keynote slides exported as PNGs for viewability on GitHub:

plot of chunk unnamed-chunk-14


Caveats / problems with this example


Wins of this example

GOOD ENOUGH!


Other tips

Tips: the from_joe directory


Tip: give yourself less rope


Tip: prose


Tip: life cycle of data

Here’s how most data analyses go down in reality:


Prepare data $\rightarrow$ Do stats $\rightarrow$ Make tables & figs

The R scripts:

01_marshal-data.r
02_pre-dea-filtering.r
03_dea-with-limma-voom.r
04_explore-dea-results.r
90_limma-model-term-name-fiasco.r

The figures left behind:

02_pre-dea-filtering-preDE-filtering.png
03-dea-with-limma-voom-voom-plot.png
04_explore-dea-results-focus-term-adjusted-p-values1.png
04_explore-dea-results-focus-term-adjusted-p-values2.png
...
90_limma-model-term-name-fiasco-first-voom.png
90_limma-model-term-name-fiasco-second-voom.png

Recap

File organization should reflect inputs vs outputs and the flow of information

/Users/jenny/research/bohlmann/White_Pine_Weevil_DE:
drwxr-xr-x  20 jenny  staff        680 Apr 14 15:44 analysis
drwxr-xr-x   7 jenny  staff        238 Jun  3  2014 data
drwxr-xr-x  22 jenny  staff        748 Jun 23  2014 model-exposition
drwxr-xr-x   4 jenny  staff        136 Jun  3  2014 results

plot of chunk unnamed-chunk-15

Key Points

  • File organization is important.