Data Cleaning with OpenRefine for Ecologists: Glossary

Key Points

  • OpenRefine is a powerful, free and open source tool that can be used for data cleaning.

  • OpenRefine will automatically track any steps you take in working with your data.

Exploring Data with OpenRefine
  • Faceting can identify errors or outliers in data

Transforming Data
  • Clustering can identify outliers in data and help us fix errors in bulk

Filtering and Sorting with OpenRefine
  • OpenRefine provides various ways to sort and filter data without affecting the raw data.

Exporting Data Cleaning Steps
  • All changes are being tracked in OpenRefine, and this information can be used for scripts for future analyses or reproducing an analysis.

  • Scripts can (and should) be published together with the dataset as part of the digital appendix of the research output.

Exporting and Saving Data from OpenRefine
  • Cleaned data or entire projects can be exported from OpenRefine.

  • Projects can be shared with collaborators, enabling them to see, reproduce and check all data cleaning steps you performed.

Other Resources in OpenRefine
  • Other examples and resources online are good for learning more about OpenRefine