This time allotted for the teaching and exercises in lessons one through eight in this episode totals 135 minutes. This does not include time for installing OpenRefine, which could take an extra 10-30 minutes depending on how many different platforms and how many computers need OpenRefine installed.
There is a separate file for the setup instructions for installing OpenRefine (setup).
The datasets used
- The dataset used in this lesson can be downloaded from Figshare through the link on the (setup page).
- It will need to be downloaded to the local machine before it can be loaded into OpenRefine.
- A general description of the dataset used in the Social Sciences lessons can be found in the workshop data home page
- Explains what OpenRefine is, what it is used for and where to get help.
- Covers the creation of an OpenRefine project using our dataset.
- The file has a single header row and is csv.
- Facets and clustering are introduced and there is a discussion on the different clustering algorithms and how they may produce different results.
- Splitting columns is covered as is undo/redo.
- Using Include and Exclude from a facet is covered and the difference between faceting and filtering is explained.
- The various sort options for single or multiple columns is covered.
- Explains that everything is a string until you change it.
- Explains how to change the data type and the additional faceting ability it provides.
- Explains how actions within a project can be copied to an external file and re-applied. The same file is used to re-apply the changes.
- Covers the overall format of a project ‘file’ and how the components can be viewed.
- This may require installing additional software on Windows machine (e.g. 7-zip) as the built-in un-zipping facility does not work with tar.gz files.
- Just a list of various OpenRefine resources available on-line (taken from the Ecology lessons)