
Cleaning up the attribute table
There are many reasons why we need to clean up attribute tables every now and then. These may be because we receive badly structured or named data from external sources, or because data processing, such as the layer joins that we performed in the previous exercise, require some post processing. This recipe shows us how to use attribute table and the Table Manager plugin to rename, delete, and reorder columns, as well as how to convert between different data types using Field Calculator.
Getting ready
If you performed the previous recipe, just save the joined layer to a new shapefile; otherwise, load census_wake2000_pop.shp
. In any case, you will notice that the dataset contains a lot of duplicate information, and the column names could use some love as well. To follow this recipe, you should also install and enable the Table Manager plugin by navigating to Plugins | Manage and Install Plugins.
How to do it…
- Our first step to clean up this dataset is to delete duplicated information. From all available columns, we only want to keep
_STATE
,_COUNTY
,_TRACT
,FIPSSTCO
,TRT2000
,STFID
,_POP2000
,AREA
, andPERIMETER
. - To delete the other columns, enable editing using the Toggle editing mode button in the upper-left corner of the attribute table or by pressing Ctrl + E. This activates the Delete column button.
- Alternatively, you can also press Ctrl + L to open the Delete attributes dialog. This dialog allows us to delete multiple columns at once. Just select all the columns that you want to be deleted, press OK, and QGIS will display the reduced attribute table.
- Next, we will rename columns to remove the leading underscores in some of the column names. This can be done using the Table Manager plugin.
- When you start the plugin (edit mode should be disabled), you will see a list of the layer columns. The plugin allows you to change the order of columns, as well as rename, insert, clone, and delete columns.
- To rename a column, just select it in the list and press the Rename button. You'll then be asked to provide a new name. Go ahead and remove the leading underscores from
_STATE
,_COUNTY
,_TRACT
, and_POP2000
. - Finally, using the Move up and Move down buttons, you can also rearrange the column order to something more intuitive. We'd suggest moving
STFID
to the first position andAREA
andPERIMETER
to the last. - If you press Save, the changes will be saved back to the layer source file. Alternatively, you can also create a new file using Save as....
How it works…
The steps provided in this exercise are mostly limited to layers with shapefile sources. If you use other input data formats, such as MIF, GML, or GeoJSON files, you will notice that the Toggle editing button is grayed out because these files cannot be edited in QGIS. Whether a certain format can be edited in QGIS or not depends on which functionality has been implemented in the respective GDAL/OGR driver.
Note
The GDAL/OGR version that is used by QGIS is either part of the QGIS package (as in the case of the Windows installers) or QGIS uses the GDAL library existing in your system (on Linux and Mac). To get access to specific drivers that are not supported by the provided GDAL/OGR version, it is possible to compile custom versions of GDAL/OGR, but the details of doing this are out of the scope of this cookbook.
There's more…
Another common task while dealing with attribute table management is changing column data types. Currently, it is not possible to simply change the data type directly. Instead, we have to use Field Calculator (which is directly accessible through the corresponding button in the Attributes toolbar or from the attribute table dialog) to perform conversions and create a new column for the result.
In our census_wake2000_pop.shp
file, for example, the tract ID, TRACT
, is stored in a REAL
type column with a precision of 15 digits even though it may be preferable to simply have it in a STRING
column and formatted to two digits after the decimal separator. To create such a column using Field Calculator, we can use the following expression:
format_number("TRACT",2)
Compared to a simple conversion (which would be simple, use tostring("TRACT"), format_number("TRACT",2)
offers the advantage that all values will be formatted to display two digits after the decimal separator, while a simple conversion would drop these digits if they are zeros.
Of course, it's also common to convert from text to numerical. In this case, you can chose between toint()
and toreal()
.
See also
- Have a look through the conversion functions in the Field Calculator Function list to see the other available functions that can deal with date and time data types. Usage of all these functions is explained in Selected function help directly in the calculator dialog.