Summary and Setup

Good data organization is the foundation of any research project. Most researchers have data in spreadsheets, so it’s the place that many research projects start.

Typically we organize data in spreadsheets in ways that we as humans want to work with the data. However computers require data to be organized in particular ways. In order to use tools that make computation more efficient, such as programming languages like R or Python, we need to structure our data the way that computers need the data. Since this is where most research projects start, this is where we want to start too!

In this lesson, you will learn:

  • Good data entry practices - formatting data tables in spreadsheets
  • How to avoid common formatting mistakes
  • Approaches for handling dates in spreadsheets
  • Basic quality control and data manipulation in spreadsheets
  • Exporting data from spreadsheets

In this lesson, however, you will not learn about data analysis with spreadsheets. Much of your time as a researcher will be spent in the initial ‘data wrangling’ stage, where you need to organize the data to perform a proper analysis later. It’s not the most fun, but it is necessary. In this lesson you will learn how to think about data organization and some practices for more effective data wrangling. With this approach you can better format current data and plan new data collection so less data wrangling is needed.

Getting Started

Data Carpentry’s teaching is hands-on, so participants are encouraged to use their own computers to ensure the proper setup of tools for an efficient workflow.
These lessons assume no prior knowledge of the skills or tools.

To get started, follow the directions in the “Setup” tab to download data to your computer and follow any installation instructions.

Prerequisites

This lesson requires a working copy of spreadsheet software, such as Microsoft Excel or LibreOffice or OpenOffice.org (see more details in “Setup”).
To most effectively use these materials, please make sure to install everything before working through this lesson.

For Instructors

If you are teaching this lesson in a workshop, please see the Instructor notes.

Data

You need to download some files to follow this lesson:

  1. Download the following three files:
  1. Place these 3 files in a folder you can easily find and access on your computer (for instance in a datacarpentry-spreadsheets folder on your Desktop or within your Home folder).

About the data

For more information about the dataset and to download it from Figshare, check out the Social Sciences workshop data page.

Software

To work through this tutorial you will need access to a spreadsheet program. For this you have many options: Microsoft Excel, LibreOffice, Apple Numbers, Gnumeric, Onlyoffice, WPS office, among others. Commands may differ a bit between programs, but the general ideas for thinking about spreadsheets are the same.

For this lesson, we encourage you to use LibreOffice or Microsoft Excel, as the tasks we will be doing have been tested in these programs. If you don’t have Microsoft Excel, you can use LibreOffice. It’s a free, open source spreadsheet program. Here are the instructions to install it:

Windows

  • Download the Installer
    Install LibreOffice by going to the installation page. The version for Windows should automatically be selected. Click Download. You will go to a page that asks about a donation, but you don’t need to make one. Your download should begin automatically.
  • Install LibreOffice
    Once the installer is downloaded, double click on it and it should install.

Mac OS X

  • Download the Installer
    Install LibreOffice by going to the installation page. The version for macOS should automatically be selected. Click Download. You will go to a page that asks about a donation, but you don’t need to make one. Your download should begin automatically.
  • Install LibreOffice
    The file LibreOffice_X.X.X_MacOS_x86-64 (whichever version of LibreOffice you have selected) should have been downloaded. Double click on this file, and LibreOffice will be installed.

Linux

  • Download the Installer
    Install LibreOffice by going to the installation page. The version for Linux should automatically be selected. Click Download. You will go to a page that asks about a donation, but you don’t need to make one. Your download should begin automatically.
  • Install LibreOffice
    Once the installer is downloaded, double click on it and it should install.