File Organization: Naming
Overview
Teaching: 30 min
Exercises: 10 minQuestions
What are the common file organization errors?
What are best practices for file organization?
Objectives
Highlight common SNAFUs
Learn to employ unit testing
Names matter
NO
myabstract.docx
Joe’s Filenames Use Spaces and Punctuation.xlsx
figure 1.png
fig 2.png
JW7d^(2sl@deletethisandyourcareerisoverWx2*.txt
YES
2014-06-08_abstract-for-sla.docx
joes-filenames-are-getting-better.xlsx
fig01_scatterplot-talk-length-vs-interest.png
fig02_histogram-talk-attendance.png
1986-01-28_raw-data-from-challenger-o-rings.txt
Three principles for (file) names:
- Machine readable
- Human readable
- Plays well with default ordering
Awesome file names :)
Machine readable
Machine readable
- Regular expression and globbing friendly
- Avoid spaces, punctuation, accented characters, case sensitivity
- Easy to compute on
- Deliberate use of delimiters
Globbing
Except of complete file listing:
Example of globbing to narrow file listing:
Same using Mac OS Finder search facilities:
Same using regex in R
:
Punctuation
Deliberate use of “-“ and “_” allows recovery of meta-data from the filenames:
- “_” underscore used to delimit units of meta-data I want later.
- ”-“ hyphen used to delimit words so my eyes don’t bleed.
This happens to be R
but also possible in the shell
, Python
, etc.
Recap: machine readable
- Easy to search for files later
- Easy to narrow file lists based on names
- Easy to extract info from file names, e.g. by splitting
- New to regular expressions and globbing? Be kind to yourself and avoid
- Spaces in file names
- Punctuation
- Accented characters
- Different files named
foo
andFoo
Human readable
Human readable
- Name contains info on content
- Connects to concept of a slug from semantic URLs
Example
Which set of file(name)s do you want at 3 a.m. before a deadline?
Embrace the slug
Recap: Human readable
Easy to figure out what the heck something is, based on its name
Plays well with default ordering
Plays well with default ordering
- Put something numeric first
- Use the ISO 8601 standard for dates
- Left pad other numbers with zeros
Examples
Chronological order:
Logical order: Put something numeric first
Dates: Use the ISO 8601 standard for dates: YYYY-MM-DD
Left pad other numbers with zeros
If you don’t left pad, you get this:
10_final-figs-for-publication.R
1_data-cleaning.R
2_fit-model.R
which is just sad :(
Recap: Plays well with default ordering
- Put something numeric first
- Use the ISO 8601 standard for dates
- Left pad other numbers with zeros
Recap
Three principles for (file) names
- Machine readable
- Human readable
- Plays well with default ordering
Pros
- Easy to implement NOW
- Payoffs accumulate as your skills evolve and projects get more complex.
Go forth and use awesome file names :)
Key Points
File organization is important.