This lesson is in the early stages of development (Alpha version)

Data Analysis and Visualization with Python for Social Scientists *alpha*

Lesson Maintainers: Stephen Childs, Geoffrey Boushey, Annajiat Alim Rasel

Python is a general purpose programming language that is useful for writing scripts to work effectively and reproducibly with data.

This is an introduction to Python designed for participants with no programming experience. These lessons can be taught in a day (~ 6 hours). They start with some basic information about Python syntax, the Jupyter notebook interface, and move through how to import CSV files, using the pandas package to work with data frames, how to calculate summary information from a data frame, and a brief introduction to plotting. The last lesson demonstrates how to work with databases directly from Python.

Getting Started

Data Carpentry’s teaching is hands-on, so participants are encouraged to use their own computers to ensure the proper setup of tools for an efficient workflow.

These lessons assume no prior knowledge of the skills or tools.

To get started, follow the directions in the “Setup” tab to download data to your computer and follow any installation instructions.


This lesson requires a working copy of Python.

To most effectively use these materials, please make sure to install everything before working through this lesson and download data files mentioned in the Setup tab.

For Instructors

If you are teaching this lesson in a workshop, please see the Instructor notes.


Setup Download files required for the lesson
00:00 1. Introduction to Python Why learn Python?
What are Jupyter notebooks?
00:15 2. Python basics How do I assign values to variables?
How do I do arithmetic?
What is a built-in function?
How do I see results?
What data types are supported in Python?
01:10 3. Python control structures What constructs are available for changing the flow of a program?
How can I repeat an action many times?
How can I perform the same task(s) on a set of items?
01:55 4. Creating re-usable code What are user defined functions?
How can I automate my code for re-use?
02:35 5. Processing data from a file How can I read and write files?
What kind of data files can I read?
03:45 6. Dates and Time How are dates and time represented in Python?
How can I manipulate dates and times?
04:10 7. Processing JSON data What is JSON format?
How can I extract specific data items from a JSON record?
How can I convert an array of JSON record into a table?
04:55 8. Reading data from a file using Pandas What is Pandas?
How do I read files using Pandas?
What is the difference between reading files using Pandas and other methods of reading files?
05:15 9. Extracting row and columns How can I extract specific rows and columns from a Dataframe?
How can I add or delete columns from a Dataframe?
How can I find and change missing values in a Dataframe?
05:45 10. Data Aggregation using Pandas How can I summarise the data in a data frame?
06:15 11. Joining Pandas Dataframes How can I join two Dataframes with a common key?
06:50 12. Wide and long data formats What are long and Wide formats?
Why would I want to change between them?
07:25 13. Data visualisation using Matplotlib How can I create visualisations of my data?
08:15 14. Accessing SQLite Databases How can I access database tables using Pandas and Python?
What are the advantages of storing data in a database
09:15 Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.