GIS Jake: Introducing raster_chunk_processing.py (aka, RCP)

This is part 1 of a three-part post. Read part 2 (Installation and Usage) here and part 3 (Example Output) here.

After over a year of off-and-on development, I've finally got my second-favorite project ready for the masses (ie, I finally added command-line argument parsing). I present to you the new, the shiny, raster_chunk_processing.py!

Edit, May 2019:
That feeling when you think you've got something working, only to discover that its not doing what you think it's doing. rcp's skymodeling algorithm has been updated to properly compute shadows. Updated examples will be forthcoming.

Ok, so I need to work on my naming skills.

What Makes RCP Special?
RCP applies any one of several raster processing methods to arbitrarily-large rasters by automatically breaking up the raster into smaller chunks. This a) avoids out-of-memory errors with large rasters, 32-bit programs, and/or 8 GB or 16 GB memory loadouts in normal desktop computers and b) facilitates parallel processing when feasible.

What makes RCP better than re-tiling a large raster and processing the chunks individually? Many of these algorithms modify or create a new value for each pixel based on the pixels around them. When you run these on individual tiles and then merge them, the pixels on the edges of the tiles don't have all the needed info and produce a noticeable line in the final image. You also spend time and storage space tiling and merging.

RCP avoids these downsides by dividing the raster into chunks of a user-defined size, adding a few pixels (again, a user-defined amount) to each side of the chunk to avoid the aforementioned edge problems, then reading only this chunk into memory and working on it, and finally writing it into it's proper location within the output file.

RCP relies on GDAL for all the file access heavy lifting, and I owe a huge shout-out to Frank Warmerdam, Even Rouault, and the rest of the GDAL team.

Processing Methods
Currently, RCP can do several really cool things. The first is a selection of DEM smoothers (blur_mean, blur_gauss, blur_toews, and mdenoise). Next, and really, really fun, is Kennelly and Stewart's skymodel hillshade (along with a bog-standard hillshade algorithm copied shamelessly from the GDAL source). It will also calculate Topographic Position Index (TPI) with a user defined window, which is a step up on the default gdaldem TPI which uses a fixed window size. Finally, it provides a really esoteric contrast stretcher called Contrast Limited Adaptive Histogram Equalization (CLAHE).

DEM Smoothing
Often, a DEM comes to us with some imperfections. Maybe that new half-foot Lidar is too detailed. Maybe you're tired of those contour-line-esque artifacts in the NED 1/3 arc-second DEMs. Maybe you don't want to show so much detail, but want to maintain the crisp mountain ridges. This is where smoothing helps: in preparation for hillshading or some other cartographic, non-analytical process.

blur_mean smoothes the DEM by taking the average of all the pixels within a user-dfined radius/diameter circle around each pixel. This is similar to the ArcGIS Focal Statistics (Mean) tool with a circular neighborhood.

blur_gauss is similar to blur_mean, but it uses a 2d Gaussian distribution kernel for it's average, providing more detailed results than a regular blur_mean. You could do this with a custom neighborhood in Focal Statistics, but RCP makes it trivial to change the neighborhood size.

blur_toews is a smoothing kernel provided by Mike T. at https://gis.stackexchange.com/a/10467. It is slightly different from blur_mean and blur_gauss and may give better results depending on the terrain.

mdenoise uses the mdenoise.exe program that implements Sun et al (2007)'s Mesh Denoise algorithm, again as inspired by Mike T. at https://gis.stackexchange.com/a/12856. The compiled .exe is 32 bit and can only handle small rasters before blowing up and running out of memory. By limiting the size of the chunk we're working on, RCP allows us to use this executable as-is (along with automating the tif->asc->tif conversion process) and to parallelize the process.

Hillshades
My first impetus for this script was Kennelly and Stewart's skymodel hillshade technique, as featured on the cover of the 2016 ESRI Map Book. More optimization work needs to be done, but my ultimate goal is to get this working much faster than ESRI's Terrain Tools script tool through parallelization (and maybe offloading to the GPU).

skymodel is the aforementioned implementation of the skymodel technique, using Dr. Stewart's SkyLum.exe to generate a list of different hillshade azimuths and altitudes that are then combined into the final image.

hillshade is the bog-standard hillshading algorithm lifted from the gdaldem code. This is mainly used to generate all the different hillshades for the skymodel function.

Other Utilities
There are two other utilities I've included as I've played around with different raster techniques.

TPI is the normal Topographic Position Index calculation of the pixel value minus the mean of the surrounding pixels, which can then be categorized and displayed as desired. Coincidentally, this is also a high-pass version of the mean filter (http://fourier.eng.hmc.edu/e101/lectures/Fourier_Analysis/node10.html, http://northstar-www.dartmouth.edu/doc/idl/html_6.2/Filtering_an_Imagehvr.html). Hillshading the results of TPI can be rather interesting, depending on the TPI kernel width.

CLAHE is some really cool image magic that increase the contrast of your image in neat ways (https://imagej.net/Enhance_Local_Contrast_(CLAHE), https://en.wikipedia.org/wiki/Adaptive_histogram_equalization#Contrast_Limited_AHE) Yeah, I don't quite understand all the implications myself... but it can look cool! I've used it to increase the contrast of hillshades that I've then multiplied into aerial imagery to "bumpmap" the imagery. Increasing the contrast allows me to darken the shadows without increasing the darkness of the grey midtones.

WARNING: CLAHE relies on knowing the historgam of the entire image, so running it with chunks can cause visibly different stretches for each chunk. If you make your chunks large enough that the histograms are roughly the same, this difference nearly disappears.

For example, if one chunk covers mainly valley floors (where there are few hills and thus few shadows) and another chunk covers mountains (with lots of whites and lots of darks), they're going to have different histograms and thus different stretches. However, if your chunks are big enough that they cover both valley and mountains, the histograms should be similar enough to get away with.

Check the next posts for how to use it and semi-cartographic examples!

PS: Any feedback on RCP is welcome!

Friday, October 19, 2018

Introducing raster_chunk_processing.py (aka, RCP)