GIS Jake: January 2018

Friday, January 5, 2018

Zion's Fireworks: A Map of Canyons in Zion National Park

I recently created this map of Zion Canyon in Zion National Park for a good friend who loves canyoneering, and I really like the result; it ticks all the right boxes for me. The hillshade exposes the drama and excitement of the terrain, and I think I got the elevation color ramp pretty close to awesome.

As a canyoneer you read about all these great canyons, but sometimes it's hard to get a good mental map of where they are in relation to more common points of interest. The USGS topos usually label canyons and routes that took their name from an old established canyon name, like Heaps, Imlay, and Behunin Canyons. However, names like Das Boot and Misery Canyon were created by the canyoneering community and are nowhere to be found. Fortunately, the Geographic Names Information System (GNIS) from the USGS includes most of these canyons, allowing us to create our own map:

(Keep reading after the map for a little bit more on the creation process)

Click here for a higher-resolution image

The Process

I really wanted to show of the majesty of the landscape, to show how the canyons cut sharp clefts into the landscape. I found 1-meter elevation data from the USGS for the Park, but it had some annoying artifacts that appear to come from the lidar data or processing:

However, it smoothed up quite nicely at this scale using the MDenoise program created by Sun et al (2007) while still retaining a lot of the sharp details of the sheer red-rock cliffs:

I added a slope layer as well, symbolized as flat = white to steep = black and made it about 80% transparent. This doesn't show up much in the canyons, but brought out a little bit more detail in the lower cliff face of Zion Canyon.

Then, I created a custom color ramp based on the colors from the NPS' geology guide for the park to echo the actual rock layers and formations. It does a decent job, but can't quite do it justice without some representation for the vegetation on the plateau tops. It's also not properly indexed; the slight tilt in the strata relative to a flat sea-level "zero elevation point" means that, on opposite ends of the park, you could see different formations at the same elevation. In this map, every point of a given elevation is the same color.

The roads and rivers layers should hopefully help orient you to places you remember if you've visited the Park: the switchbacks up to the tunnel, and the end of the road at the Temple of Sinawava. I've let the flow of the labels, along with the hillshade, reveal the canyons themselves by setting the canyon lines to "No Color." Finally, a couple points of interest, like The Sentinel and Angels Landing, are identified to give a little bit more context and orientation.

And a Little Bit of Map Philosophy

All maps are, to an extent, an abstracted representation of reality. Some try to eliminate this as much as possible, using satellite imagery and high-accuracy polygons for every feature imaginable. Others embrace their abstractness, using labels in place of lines and funky symbols that represent an entire town.

With its color ramp and hillshade, I want this map to straddle the border between representational and realistic. The colors don't match exactly with what formations are found in the exposed strata at any given exact place in the canyon. But, with the whites and greens capping the domes and plateaus and the changing shades of tans and brownish-reds descending to the canyon floor, your mind (hopefully) is reminded of the grand wonders that exist in the real world.

I hope we never lose the ability to be amazed of these wonders.

Monday, January 1, 2018

A Functional Introduction to GDAL, Part 2: GeoTiff Compression

So you just got a hard drive containing some gorgeous new 6" resolution aerial imagery for your hometown and you can't wait to play with it. Upon opening the imagery folder you're hit with a grim reality: high-res imagery file sizes are huge.

Compression techniques allow us to eliminate redundant data in a raster file in such a way that we can recreate the original file (lossless compression) or enough of the original to look good (lossy compression).

Because of it's ubiquitous support in other software and multitude of compression options, this post will focus on some of the compression options available in the GeoTiff driver. The different compression schemes can be specified with the creation option -co compress=<type>. The GeoTiff driver page has a list of all the available compression schemes.

Lossless and Lossy Compression

Compression methods fall into two general categories: "lossy" and "lossless."

"Lossy" compression schemes throw away chunks of data that they think aren't too important for the sake of smaller file sizes. jpeg compression relies on the way the human brain processes visual imagery to throw away varying amounts of "extraneous" data that (depending on the amount of compression specified) we don't usually notice. This results in files that are drastically smaller than if we had full three 8-bit values for each raster cell.

The reason it's called lossy compression is because you will lose data. Once you use a lossy compression scheme on a raster, you can not recover the original data. You can get something that looks very visually similar, but the underlying values are irrevocably changed.

In contrast, "lossless" compression schemes use patterns and other mathematical magic to compress the data in a way that it can be perfectly reproduced when it's decompressed. The compression ratios for lossless compression schemes usually aren't quite as good as lossy compression schemes, but you keep all of your original data.

As a rule, never use lossy compression on data where the cell value is important, like elevation data or reflectance values in multispectral imaging. It does work great for aerial imagery that you're just trying to use as a basemap, however.

Lossy Compression with JPEG

My preferred aerial imagery compression scheme is jpeg with YcBcR color representation (-co compress=jpeg -co photometric=ycbcr). If you're big into color theory you can probably explain better than I can what YcBcR is and why exactly it helps, but long story short it can sometimes give you better jpeg compression ratios (read: smaller files) for aerial imagery.

You can also play around with the jpeg quality creation option (higher quality = less data loss but larger file size), I've found the default of 75 to work pretty good for the projects I've done without major (or even noticeable) visual degradation. This gives me files that are roughly 5% to 10% the size of what they'd be without any compression. I've seen smaller ECW files from vendors, but this is close enough for what I need.

One gotcha I've come across with jpeg compression is artifacts, both inside the image data area and in the NoData areas along some borders. I'll write more on this later, with visual examples, but here are two guidelines to reduce/eliminate jpeg compression artifacts:

To eliminate artifacts on the outside edges, make sure the original image data area is perfectly rectangular. An image of merged tiles that shift down a couple pixels every tile, creating a stair-step border of white NoData areas when merged together, will show some artifacts in the white NoData areas.
To eliminate artifacts inside the data areas on the right and bottom of your image, make sure the tile/stripe size is a multiple of 8 or 16 (see the note about RGB jpeg compression on the GeoTiff driver page linked above): -co blockxsize=256 -co blockysize=256. If that doesn't work, make sure your total image height and width are multiples of your block size (see this mailing list thread).

Lossless Compression with Deflate and LZW

When I'm working with large elevation datasets and want compression that preserves the original data, I'll usually use deflate compression (-co compress=deflate). This generally seems to give me higher compression ratios than LZW compression (the other main lossless compression scheme)—I've seen it reduce file size by around 50%, but your mileage may vary based on specific datasets.

You can use the -co predictor and -co num_threads options when working with LZW or deflate compression to speed up the compression. If you're just working with aerial imagery, -co predictor=3 doesn't make sense because your data isn't in floating point format (i.e., it's all whole numbers). These can be useful for very large datasets, but the smallish example dataset below only took a few seconds to compress no matter the method.

Compression Comparison

I used the San Francisco Bay image available from Earthstar Geographics to do a basic comparison of the compression schemes we've discussed. The image is 7204 x 7204 pixels in three bands of type byte, originally in the GeoTiff format. Note that the compression percentages will vary between every image (the relatively poor compression for LZW surprised me here) but the general ranking should stay the same.

Compression Scheme	File Size	% of Original File Size	Lossy/Lossless
Uncompressed	155.9 MB	n/a	n/a
Deflate (z=6)	112.7 MB	72.3%	Lossless
LZW	142.9 MB	91.7%	Lossless
jpeg (75% Quality)	25.1 MB	16.1%	Lossy
jpeg (YcBcR, 75% Quality)	9.1 MB	4.8%	Lossy
jpeg (YcBcR, 90% Quality)	14.9 MB	9.49%	Lossy
ECW (2:1, from vendor)	28.3 MB	18.2%	Lossy

Note: The file sizes here were calculated on a Mac, which defines a MB as 1000^2 bytes (decimal-based), while Windows currently defines a MB as 1024^2 bytes (binary-based). If you download the example file yourself, you'll see the metadata file says its 148 MB, which would be the Windows version of the size. Confusing? Yeah.

The jpeg compression is the clear winner here, and zoomed in to actual size (one pixel in the image is one pixel on my monitor) I couldn't tell any difference even between the 75% quality jpeg and the uncompressed GeoTiff.

Compression and BigTIFF

The default TIFF format only allows for files 4 GB or smaller. GDAL overcomes this barrier with the BigTIFF creation option that allows for really big files. The default option is if_needed, which tries to figure out if the new file will be greater than 4 GB and sets the flag if needed. Because this is the default, you have probably already created uncompressed GeoTiffs larger than 4 GB without even realizing this option exists.

However, compression throws a wrench into this default. The driver doesn't know how well the compression will work and fails to set the BigTIFF option, which often leads to write errors when creating large datasets. Whenever I'm working with large compressed datasets I use -co bigtiff=yes if there's any possibility the output will be greater than 4 GB.

ECW and MrSid

I've mentioned "ECW" and "MrSid" a couple times in this series. These are different formats (not compression options for GeoTiffs) that use mathematical magic known as wavelets to achieve lossy compression ratios greater than what's possible with jpeg, and with better image quality and data access to boot. Several Jpeg2000 compression schemes also use wavelet-based compression in a similar matter.

The potential downside is that these are not open formats—they are protected by patents and require special (and possibly expensive) licenses. There's usually a free license for reading the data (which is what allows GDAL to read them), but creating them (or in the case of ECW, distributing images from a server as part of a webmap) require ponying up the license fee. You're the only one who can decide if that route is best for you.

Next up

Our next post will investigate gdal_translate, one of the most common and most useful GDAL tools.

Notes
Paul Ramsey's excellently concise compression discussion: http://blog.cleverelephant.ca/2015/02/geotiff-compression-for-dummies.html
Kersten Clauss' discussion (and test script!) on GDAL's lossless compression options: http://www.digital-geography.com/geotiff-compression-comparison/
A deep discussion on jpeg compression by Even Rouault, one of the GDAL developers: http://erouault.blogspot.com/2014/04/advanced-jpeg-in-tiff-uses-in-gdal.html (Seriously, this guy is awesome. Super knowledgeable and very active on the gdal-dev mailing list).

A Functional Introduction to GDAL, Part 1: Creation Options

Many GDAL tools create a new file that holds the results of its work. Most of these tools support the -of, -ot, and -co switches to set the output format, the output data type, and any format-specific options.

Setting the Raster Format: -of <name>

There are dozens upon dozens of different raster formats in the GIS world, and GDAL can handle many of them. The ability to read or write a format is provided by what GDAL calls "drivers," and there's a specific driver for each format.

While GDAL can read just about anything you throw at it, it can only write a handful of formats. The list of supported formats can be found in the GDAL documentation. The "Create" column will tell you if write support is present for your desired format. "Copy" is slightly different from create and depends on the raster data already being present. If you're opening and modifying an existing raster, you can use a format that has Copy capabilities; if you're creating a new raster, you need to use a Create format. See the CreateCopy() documentation for more info.

Many formats require additional work—for example, the ECW driver requires the ECW SDK from Hexagon Geospatial that has two different types of licensing. The read-only license is freely available for desktop use (which does not allow you to use it with server software distributing imagery), while the server read-only and desktop read-write requires a paid license. Getting this working is beyond the scope of this post, though I think I've seen the stand-alone QGIS installation come with read support enabled.

The name in the "Code" column is the "short format name" mentioned in the help documentation and is the name you'll use with the -of switch. For example, if I want a GeoTiff, I'd type -of GTiff.

Note: As of GDAL 2.3, GDAL tries to guess the output format based on the file extension of the output filename you provide—to get a GeoTiff, you'd just have to have your filename be .tif. In previous versions, it defaults to GeoTiff if no format is specified with -of.

One format that you'll notice doesn't have creation support is jpeg, which seems odd at first blush. However, if you want highly-compressed rasters, the jpeg compression method for GeoTiffs (see the next post on compression options) provides comparable results, just with a .tif file.

Specifying an Output Data Type: -ot <type>

Different datasets will have different basic data types, which define how the data for each raster cell are stored. A DEM with elevations from 1272.3 to 3311.9 may be in Float32, while aerial imagery may have three Byte bands—red, green, and blue, with values from 0 to 255.

Each data type has specific minimum and maximum values it can represent. In addition, Byte and the various Ints (or integers) can only represent whole numbers, while the Float types allow you to represent decimal numbers, like an elevation of 1564.357 (remember, this value by itself is unitless—you need to check your metadata to see whether it's an elevation in feet, or just a brightness value from 0 to 255).

The following table shows the minimum and maximum values for each type (assuming they match the equivalent C++ types). This is taken from the GDAL API reference and the Microsoft C++ reference (the various int min/max values should stay consistent with other compilers and platforms, but there may be slight variations in the Float32 and Float64 values).

Type	Minimum	Maximum	Precision
Byte	0	255	Whole Numbers
Uint16	0	65,535	Whole Numbers
Int16	-32,768	32,767	Whole Numbers
Uint32	0	4,294,967,295	Whole Numbers
Int32	-2,147,483,648	2,147,483,647	Whole Numbers
Float32	3.4x10^-38	3.4x10^38	6 Decimal Places
Float64	1.7x10^-308	1.7x10^308	15 Decimal Places

The -ot switch allows you to specify the type of the output file. There are a couple nuances here, however, that we have to keep in mind:

First, specifying a data type does not automatically scale the data to fit into that data type. Rather, it just tries to copy the value straight across, clipping out data that doesn't fit. Going from Byte to Float32 is just fine: 232 becomes 232.0. However, going the other way presents problems: 67.395 becomes just 67, and 5692.845 gets clipped down to 255. Use gdal_translate with the -scale switch to appropriately scale your data into the new data type's ranges.
Different types require different amounts of storage for each value. A Byte, as we'd expect, takes up one byte per cell, while a Float32 takes up four bytes (32 bits, 8 bits to a byte). This means a Float32 raster will be about four times as large as a Byte raster of the same dimensions, limiting your storage options and increasing your file read times. This also means processing the same area would require four times the allocated RAM, which could lead to memory size issues on larger data sets and 32-bit environments.
Some programs can't handle certain data types. For example, I wanted to make a custom hillshade represented by brightness values of 0-255 with a NoData value, so I specified -ot uint16 and a NoData value of 256. However, when I tried to load it into GIMP to play around with it as a raster image, it failed because GIMP can't handle unit16 data (Note: The default gdaldem hillshade command creates a byte-type raster, using 0 as NoData and 1-255 as the brightness values. My insistence on using the range 0-255 for brightness values was foolishness.).

Creation Options: -co <option>

Creation options allow you to set certain parameters that are specific to each raster format. For example, the GeoTiff driver gives you the ability to compress your image (-co compress=<type>) or to create an accompanying .tfw world file (-co tfw=yes). Multiple -co switches can be used in one command, like this set that creates a GeoTiff with jpeg compression, a YcBcR color scheme, and the ability to create a file bigger than 4GB: -co compress=jpeg -co photometric=ycbcr -co bigtiff=yes.

The creation options available to each driver are listed on the driver's documentation page, which are linked from the format list above. For example, the GeoTiff creation options can be found about halfway through the GeoTiff format documentation.

The documentation usually shows all the creation options in upper case. This isn't absolutely necessary, however. I usually just use lower case for all the options (like you see in these examples) because it's easier to type.

Sidetrack: Configuration Options

There's a bit of a grey area between the definitions of creation options and configuration options (https://trac.osgeo.org/gdal/wiki/ConfigOptions). Generally speaking, creation options change the way data is stored in the file you create while configuration options affect how GDAL functions to create that file. You can specify configuration options as part of a command with the --config switch, like this --config gdal_cachemax 2048. Note that creation options don't have '=' between the option and its value.

Note: The distinction between creation and configuration options gets even grayer in gdaladdo, where you set the compression options for your overviews via the --config switch.

Putting It All Together
Let's say we want to change an Erdas Imagine (.img) file to a jpeg-compressed GeoTiff:
gdal_translate -of GTiff -co compress=jpeg -co photometric=ycbcr -co tiled=yes -co bigtiff=yes in.img out.tif

Coming Up Next

Next post we'll look at the compression options available with the GeoTiff driver.

Notes
More on ECW: http://www.gdal.org/frmt_ecw.html, https://trac.osgeo.org/gdal/wiki/ECW, and https://gis.stackexchange.com/questions/154929/how-to-install-gdal-with-ecw-support-on-windows?rq=1