Monday, January 1, 2018

A Functional Introduction to GDAL, Part 1: Creation Options

Many GDAL tools create a new file that holds the results of its work. Most of these tools support the -of, -ot, and -co switches to set the output format, the output data type, and any format-specific options.

Setting the Raster Format: -of <name>

There are dozens upon dozens of different raster formats in the GIS world, and GDAL can handle many of them. The ability to read or write a format is provided by what GDAL calls "drivers," and there's a specific driver for each format.

While GDAL can read just about anything you throw at it, it can only write a handful of formats. The list of supported formats can be found in the GDAL documentation. The "Create" column will tell you if write support is present for your desired format. "Copy" is slightly different from create and depends on the raster data already being present. If you're opening and modifying an existing raster, you can use a format that has Copy capabilities; if you're creating a new raster, you need to use a Create format. See the CreateCopy() documentation for more info.

Many formats require additional work—for example, the ECW driver requires the ECW SDK from Hexagon Geospatial that has two different types of licensing. The read-only license is freely available for desktop use (which does not allow you to use it with server software distributing imagery), while the server read-only and desktop read-write requires a paid license. Getting this working is beyond the scope of this post, though I think I've seen the stand-alone QGIS installation come with read support enabled.

The name in the "Code" column is the "short format name" mentioned in the help documentation and is the name you'll use with the -of switch. For example, if I want a GeoTiff, I'd type -of GTiff.

Note: As of GDAL 2.3, GDAL tries to guess the output format based on the file extension of the output filename you provide—to get a GeoTiff, you'd just have to have your filename be .tif. In previous versions, it defaults to GeoTiff if no format is specified with -of.

One format that you'll notice doesn't have creation support is jpeg, which seems odd at first blush. However, if you want highly-compressed rasters, the jpeg compression method for GeoTiffs (see the next post on compression options) provides comparable results, just with a .tif file.

Specifying an Output Data Type: -ot <type>

Different datasets will have different basic data types, which define how the data for each raster cell are stored. A DEM with elevations from 1272.3 to 3311.9 may be in Float32, while aerial imagery may have three Byte bands—red, green, and blue, with values from 0 to 255.

Each data type has specific minimum and maximum values it can represent. In addition, Byte and the various Ints (or integers) can only represent whole numbers, while the Float types allow you to represent decimal numbers, like an elevation of 1564.357 (remember, this value by itself is unitless—you need to check your metadata to see whether it's an elevation in feet, or just a brightness value from 0 to 255).

The following table shows the minimum and maximum values for each type (assuming they match the equivalent C++ types). This is taken from the GDAL API reference and the Microsoft C++ reference (the various int min/max values should stay consistent with other compilers and platforms, but there may be slight variations in the Float32 and Float64 values).


Type Minimum Maximum Precision
Byte
0
255
Whole Numbers
Uint16
0
65,535
Whole Numbers
Int16
-32,768
32,767
Whole Numbers
Uint32
0
4,294,967,295
Whole Numbers
Int32
-2,147,483,648
2,147,483,647
Whole Numbers
Float32
3.4x10^-38
3.4x10^38
6 Decimal Places
Float64
1.7x10^-308
1.7x10^308
15 Decimal Places

The -ot switch allows you to specify the type of the output file. There are a couple nuances here, however, that we have to keep in mind:
  • First, specifying a data type does not automatically scale the data to fit into that data type. Rather, it just tries to copy the value straight across, clipping out data that doesn't fit. Going from Byte to Float32 is just fine: 232 becomes 232.0. However, going the other way presents problems: 67.395 becomes just 67, and 5692.845 gets clipped down to 255. Use gdal_translate with the -scale switch to appropriately scale your data into the new data type's ranges.
  • Different types require different amounts of storage for each value. A Byte, as we'd expect, takes up one byte per cell, while a Float32 takes up four bytes (32 bits, 8 bits to a byte). This means a Float32 raster will be about four times as large as a Byte raster of the same dimensions, limiting your storage options and increasing your file read times. This also means processing the same area would require four times the allocated RAM, which could lead to memory size issues on larger data sets and 32-bit environments.
  • Some programs can't handle certain data types. For example, I wanted to make a custom hillshade represented by brightness values of 0-255 with a NoData value, so I specified -ot uint16 and a NoData value of  256. However, when I tried to load it into GIMP to play around with it as a raster image, it failed because GIMP can't handle unit16 data (Note: The default gdaldem hillshade command creates a byte-type raster, using 0 as NoData and 1-255 as the brightness values. My insistence on using the range 0-255 for brightness values was foolishness.).

Creation Options: -co <option>

Creation options allow you to set certain parameters that are specific to each raster format. For example, the GeoTiff driver gives you the ability to compress your image (-co compress=<type>) or to create an accompanying .tfw world file (-co tfw=yes). Multiple -co switches can be used in one command, like this set that creates a GeoTiff with jpeg compression, a YcBcR color scheme, and the ability to create a file bigger than 4GB: -co compress=jpeg -co photometric=ycbcr -co bigtiff=yes.

The creation options available to each driver are listed on the driver's documentation page, which are linked from the format list above. For example, the GeoTiff creation options can be found about halfway through the GeoTiff format documentation.

The documentation usually shows all the creation options in upper case. This isn't absolutely necessary, however. I usually just use lower case for all the options (like you see in these examples) because it's easier to type.

Sidetrack: Configuration Options

There's a bit of a grey area between the definitions of creation options and configuration options (https://trac.osgeo.org/gdal/wiki/ConfigOptions). Generally speaking, creation options change the way data is stored in the file you create while configuration options affect how GDAL functions to create that file. You can specify configuration options as part of a command with the --config switch, like this --config gdal_cachemax 2048. Note that creation options don't have '=' between the option and its value.

Note: The distinction between creation and configuration options gets even grayer in gdaladdo, where you set the compression options for your overviews via the --config switch.

Putting It All Together
Let's say we want to change an Erdas Imagine (.img) file to a jpeg-compressed GeoTiff:
gdal_translate -of GTiff -co compress=jpeg -co photometric=ycbcr -co tiled=yes -co bigtiff=yes in.img out.tif

Coming Up Next

Next post we'll look at the compression options available with the GeoTiff driver.

Notes
More on ECW: http://www.gdal.org/frmt_ecw.htmlhttps://trac.osgeo.org/gdal/wiki/ECW, and https://gis.stackexchange.com/questions/154929/how-to-install-gdal-with-ecw-support-on-windows?rq=1