Examining Numbers in OpenRefine
OverviewTeaching: 10 min
Exercises: 10 minQuestions
How can we convert a column from one data type to another?
How can we find non-numeric values in a column that should contain numbers?Objectives
Transform a text column into a number column.
Identify and modify non-numeric values in a numeric column using facets.
When a table is imported into OpenRefine, all columns are treated as containing
text values. We saw earlier how we can sort column values as numbers, but this
does not change the cells in a column from text to numbers. Rather, this
interprets the values as numbers for the purposes of sorting but keeps the
underlying data type as is. We can, however, transform columns from text to
other data types (e.g. number or date) using the
Edit cells >
transforms feature. Here we will experiment changing columns to numbers and
see what additional capabilities that grants us.
Be sure to remove any
Text filter facets you have enabled from the left panel
so that we can examine our whole dataset. You can remove an existing facet by
x in the upper left of that facet window.
To transform cells in the
years_farm column to numbers, click the down arrow
for that column, then
Edit cells >
Common transforms… >
To number. You
will notice the
years_farm values change from left-justified to
right-justified, and black to green in color.
Transform three more columns,
buildings_in_compound, from text to numbers. Can all columns be transformed to numbers? - Try it with
Only observations that include only numerals (0-9) can be transformed to numbers. If you apply a number transformation to a column that doesn’t meet this criteria, and then click the
Undo / Redotab, you will see a step that starts with
Text transform on 0 cells. This means that the data in that column was not transformed.
Sometimes non-numeric values or blanks occur in a column where numbers are
expected. Such values may represent errors in data entry, and we want to find
them. We can do that with a
- For a column you transformed to numbers, edit one or two cells, replacing the numbers with text (such as
abc) or blank (no number or text). You will need to change the
textusing the drop-down menu.
- Use the column pulldown menu to apply a numeric facet to the column you edited. The facet will appear in the left panel.
- Notice that there are several checkboxes in this facet:
Error. Below these checkboxes are counts of the number of cells in each category. You should see checks for
Blankif you changed some values.
- Experiment with checking or unchecking these boxes to select subsets of your data.
When done examining the numeric data, remove this facet by clicking the
the upper left corner of its panel. Note that this does not undo the edits you
made to the cells in this column. Use the
Undo / Redo function to reverse
OpenRefine also provides ways to to examine and clean numerical data.