Introduction


  • Good data organization is the foundation of any research project.

Formatting data tables in Spreadsheets


  • Never modify your raw data. Always make a copy before making any changes.
  • Keep track of all of the steps you take to clean your data in a plain text file.
  • Organize your data according to tidy data principles.

Formatting problems


  • Avoid using multiple tables within one spreadsheet.
  • Avoid spreading data across multiple tabs.
  • Record zeros as zeros.
  • Use an appropriate null value to record missing data.
  • Don’t use formatting to convey information or to make your spreadsheet look pretty.
  • Place comments in a separate column.
  • Record units in column headers.
  • Include only one piece of information in a cell.
  • Avoid spaces, numbers and special characters in column headers.
  • Avoid special characters in your data.
  • Record metadata in a separate plain text file.

Dates as data


  • Treating dates as multiple pieces of data rather than one makes them easier to handle.

Quality control


  • Always copy your original spreadsheet file and work with a copy so you don’t affect the raw data.
  • Use data validation to prevent accidentally entering invalid data.
  • Use sorting to check for invalid data.
  • Use conditional formatting (cautiously) to check for invalid data.

Exporting data


  • Data stored in common spreadsheet formats will often not be read correctly into data analysis software, introducing errors into your data.
  • Exporting data from spreadsheets to formats like CSV or TSV puts it in a format that can be used consistently by most programs.