Introduction


  • OpenRefine is a powerful, free and open source tool that can be used for data cleaning.
  • OpenRefine will automatically track any steps you take in working with your data.

Importing Data to OpenRefine


  • Use the Create Project option to import data
  • You can control how data imports using options on the import screen
  • Several files types may be imported into OpenRefine

Exploring Data with OpenRefine


  • Faceting can identify errors or outliers in data

Transforming Data


  • Clustering can identify outliers in data and help us fix errors in bulk

Filtering and Sorting with OpenRefine


  • OpenRefine provides various ways to sort and filter data without affecting the raw data.

Reconciliation of Values


  • OpenRefine can look up existing reconciliation services to enrich data

Looking Up Data


  • OpenRefine can look up custom URLs to fetch data based on what’s in an OpenRefine project
  • Such API calls can be custom built

Exporting Data Cleaning Steps


  • All changes are being tracked in OpenRefine, and this information can be used for scripts for future analyses or reproducing an analysis.
  • Scripts can (and should) be published together with the dataset as part of the digital appendix of the research output.

Exporting and Saving Data from OpenRefine


  • OpenRefine can save the clean data to a number of formats.
  • Cleaned data or entire projects can be exported from OpenRefine.
  • Projects can be shared with collaborators, enabling them to see, reproduce and check all data cleaning steps you performed.

Other Resources in OpenRefine


  • Other examples and resources online are good for learning more about OpenRefine