This lesson is being piloted (Beta version)

Data Organization in Spreadsheets for Social Scientists

Good data organization is the foundation of any research project. Most researchers have data in spreadsheets, so it’s the place that many research projects start.

We organize data in spreadsheets in the ways that we as humans want to work with the data, but computers require that data be organized in particular ways. In order to use tools that make computation more efficient, such as programming languages like R or Python, we need to structure our data the way that computers need the data. Since this is where most research projects start, this is where we want to start too!

In this lesson, you will learn:

In this lesson, however, you will not learn about data analysis with spreadsheets. Much of your time as a researcher will be spent in the initial ‘data wrangling’ stage, where you need to organize the data to perform a proper analysis later. It’s not the most fun, but it is necessary. In this lesson you will learn how to think about data organization and some practices for more effective data wrangling. With this approach you can better format current data and plan new data collection so less data wrangling is needed.

Getting Started

Data Carpentry’s teaching is hands-on, so participants are encouraged to use their own computers to insure the proper setup of tools for an efficient workflow.
These lessons assume no prior knowledge of the skills or tools.

To get started, follow the directions in the “Setup” tab to download data to your computer and follow any installation instructions.

Prerequisites

This lesson requires a working copy of spreadsheet software, such as Microsoft Excel or LibreOffice or OpenOffice.org (see more details in “Setup”).
To most effectively use these materials, please make sure to install everything before working through this lesson.

For Instructors

If you are teaching this lesson in a workshop, please see the Instructor notes.

Schedule

Setup Download files required for the lesson
00:00 1. Introduction What are basic principles for using spreadsheets for good data organization?
00:18 2. Formatting data tables in Spreadsheets What are some common challenges with formatting data in spreadsheets and how can we avoid them?
00:48 3. Formatting problems What are some common challenges with formatting data in spreadsheets and how can we avoid them?
01:08 4. Dates as data What are good approaches for handling dates in spreadsheets?
01:28 5. Quality assurance How can we carry out basic quality assurance in spreadsheets?
01:53 6. Exporting data How can we export data from spreadsheets in a way that is useful for downstream applications?
02:08 Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.