Friday, October 19, 2018

raster_chunk_processing.py Examples

This is part 3 of a three-part post. Read part 1 (Introduction) here and part 2 (Installation and Usage) here.

Ok, enough dry words. Let's see the goods!

These are all clips of the Grand Canyon of the Yellowstone River/Dunraven pass area from the USGS 1/3rd arc-second DEMs. For the smoothed DEMs, I created a hillshade with the gdaldem hilllshade -combined command. The elevation shade is the North_America_State_Utah_2 ramp from the ESRI Terrain Tools download.

Original Hillshade

Mean Radius 10 Hillshade
The simplest smoothing is averaging the value of the nearby pixels. This gets rid of noise, but also gets rid of the ridges and valleys. The sharp brown line is an error from importing the Terrain Tool's ArcMap styles into ArcGIS Pro.

Gauss Radius 10 Hillshade
A Gaussian kernel does a better job at smoothing away the high-frequency noise (ie, small sudden changes in elevation) while preserving the broader changes. Note the smoothing in the sides of the canyon, and the dampening of the contour-like artifacts in the gulleys.

mdenoise Hillshade
This is an aggressive smoothing setting; a similar effect might be possible with a higher-radius Gaussian blur.

Skymodel Hillshade
Behold the bold shades of the skymodel (using the original, unsmoothed DEM). With such tweakability available, this is but one possible output from this DEM. Much experimentation is needed to see if it's worth using for your data. Note that this particular output can be approximated by a simple slope calculation, with a white-to-black, shallow-to-steep-slope color ramp

CLAHE
This is a CLAHE stretch on the unsmoothed DEM's hillshade with a radius of 30 and a clip of 0.01. CLAHE gives a slightly different look than what's possible with the contrast stretcher tools in ArcGIS. It also permanently alters the file, which can be useful for bump-mapping imagery with a hillshade.

TPI 10
TPI gives different looks depending on the radius you use. It's usually symbolized with a green-yellow-red color ramp, but I like to use it as a layer in a terrain stack (set partially transparent, it can help highlight some of the more exciting terrain areas) either as is or hillshaded. This is the raw output from a kernel radius of 10.

TPI 10 Hillshade
Here's the raw hillshade from the TPI 10 dataset. Definitely not something useful on it's own, what it's really showing is channels and ridglines (which is what the raw TPI is often used to identify).

TPI 10 Combined
And here's the TPI 10 hillshade at 75% transprency layered on top of a smoothed skymodel hillshade. Look at how well-defined the draws and ridge lines are; this could make for a great basemap for a stream map.

TPI 30 Hillshade
We can get a slightly different look just by increasing the kernel radius to 30. We lose the sense of height in Mount Washburn, but the ridge lines really stand out.

Default Hillshade
Here's the default hillshade (without the -combined option) just to remind us of where we came from. Quite the snoozer, if terrain is your thing. Also note how the massive amounts of grey midtones muddle the colors of the elevation ramp. This is particularly abhorrent when laying it over imagery.



raster_chunk_processing.py Installation and Usage

This is part 2 of a three-part post. Read part 1 (Introduction) here and part 3 (Example Output) here.

Installation

Git Repo
RCP lives in the https://github.com/jacobdadams/rcp git repo, along with some other playground scripts and the environment.yml file for easily recreating the conda environment.

Prerequisites
RCP is written in Python 3 and relies on the numpy, astropy, scikit-image, numba, and gdal libraries. The recommended way of installing these packages is from the conda-forge channel using the conda package management system.

After you've installed conda, use the environment.yml file in the git repo to automatically create the rcp conda environment:

(NOTE: The environment.yml file currently does not include the numba packages. It will be updated later)
conda env create -f environment.yml

Or, just install the needed packages thus:
conda install numpy numba astropy scikit-image gdal

The package solver will take a while to get this set up. This environment is configured to avoid problems with the default channel's GDAL not supporting bigtiffs, an issue with the solver not handling the conda-forge channel properly (see https://github.com/conda-forge/gdal-feedstock/issues/219), and changes to numpy that are throwing warnings in astropy (see https://github.com/astropy/astropy/issues/7816). Just using the normal gdal installation from conda-forge seems to work now.

See my earlier post on conda for a brief intro if you're new to this world.

It may be possible to run RCP in Python 2.7, but this has not been tested. Various small tweaks would probably be required to get it running.

The mdenoise option requires Sun et al's mdenoise executable from the author's webpage. Download the appropriate version, and note it's location—you'll need to set a variable to this path.

The skymodel option requires an illumination file, which can be generated by the skyLum.exe program written by Dr. James Stewart. This can be found in the skyModels.zip archive on the repo listed below. Thank you to Dr. Stewart for allowing me to redistribute this program. skyLum.exe is only compiled for Windows.

Installing

  1. Clone the git repo: https://github.com/jacobdadams/rcp
  2. Change the mdenoise_path variable to point to your downloaded mdenoise executable (it's towards the end of the script).
  3. Create the conda environment (if you haven't done so already): conda env create -f environment.yml
    1. Or, install the needed packages in an existing environment: conda install numpy numba astropy scikit-image gdal

Usage
python raster_chunk_processing.py -m [method] {general options} {method-specific options} input_file output_file

RCP is run by your python interpreter and takes two types of options: general and method-specific. General options control the chunk size and processing, and they are required for most methods (they usually won't choose defaults if you don't specify anything). The method-specific options control the parameters for each

For a brief overview of all the options, run python raster_chunk_processing.py -h.

General Options
-m The processing method (blur_mean, blur_gauss, blur_toews, mdenoise, hillshade, skymodel, clahe, TPI).
-s The size of the chunks that the raster will be divided into for processing. mdenoise seems to fail on chunks larger than 1500, but for other methods it depends on how much memory you have and (for small rasters) how many cores you want to use at a single time.
-o The number of pixels to be read beyond each edge of the calculated chunk to avoid edge artifacts. I find something like 20 or 25 is sufficient. Some methods overwrite this setting to ensure they have good data.
-p Number of concurrent processes to run. Values larger than 1 will allow parallel processing if you have the cores available. I usually set this to n-1, where n is the number of cores I have. This is the one option with a default (1).
--verbose Toggles verbosity on. By default, RCP reports when each chunk was started, it's index (ie, 1 of 220), and the percentage of chunks already started/finished (ie, 25%; this gets a little gray because when running parallel a previously-ordered chunk in a separate stream may not be completed before this message is printed). With verbosity enabled, it reports the value of all the options passed to it, and for each chunk it adds the pixel indices of the chunk (including overlaps) and the process ID. The output from the mdenoise executable will only be shown if the verbose flag is set.

Kernel Size Option
-r Radius (in pixels) to use for blur_mean, blur_toews, blur_gauss, and TPI. This is the main parameter to control the blurring effect. Play around with this on a subset of your data to get a feel for the best look.

blur_gauss Option
-d Standard deviation of the Gaussian distribution (sigma). Start with 1, and experiment from there. A large sigma for a given radius (say, greater than 1/2 the radius value) effectively turns it into a mean filter.

mdenoise Options
-n Iterations for normal updating. Start with something around 10 and play around from there. Large values of -n and -v will increase the runtime for each chunk.
-t Threshold value, between 0 and 1. Try starting with 0.6.
-v Iterations for vertex updating. Start with 20 and play around.

CLAHE Options
-c Clip limit, between 0 and 1. I usually use really small limits, like 0.01 or 0.05; 0.1 is usually as high as I'll go.
-k Kernel size for clahe. Start with 30.

Hillshading Options
-az Azimuth of the light source (defaults to the standard of 315 degrees).
-alt Altitude of the light source (defaults to the standard of 45 degrees).

Skymodel Options
-l Path to the luminance file generated by SkyLum.exe. The header lines must be deleted so that every line in the file is a az/elevation/weight tuple.

File Options
input_file The input DEM. This must currently be a single-band file readable and createable by GDAL.
output_file The output file that will be created. A full-size but blank file is created at the beginning of the program and is then opened and updated each time a chunk is finished. The source file is used to determine the file type, projection, extent, and cell size of the output file (unless it's a VRT, in which case it defaults to a GeoTiff). If a GeoTiff is used, the output file is compressed with LZW, tiled, and bigtiff-enabled (-co compress=lzw, -co tiled=yes, -co bigtiff=yes). If you are using another format, you can edit the program to support different options (search for lzw_opts in the ParallelRCP() function).

Usage Examples
Ok, that was a lot of explanation. Here are some examples in action:

Gaussian Blur
python raster_chunk_processing.py -m blur_gauss -s 2000 -o 50 -p 3 -r 10 -d 1 dem_state.tif dem_state_gauss10.tif
(Took about a minute to process a 10k pixel by 10k pixel DEM).

mdenoise
python raster_chunk_processing.py -m mdenoise -s 1500 -o 50 -p 3 -n 20 -t 0.6 -v 30 dem_state.tif dem_state_md206030.tif
(Took about 8 minutes on the 10k by 10k DEM).

skymodel
python raster_chunk_processing.py -m skymodel -s 2000 -o 600 -p 3 -l 1_35_215_150.csv dem_state.tif dem_state_skymodel.tif



Introducing raster_chunk_processing.py (aka, RCP)

This is part 1 of a three-part post. Read part 2 (Installation and Usage) here and part 3 (Example Output) here.

After over a year of off-and-on development, I've finally got my second-favorite project ready for the masses (ie, I finally added command-line argument parsing). I present to you the new, the shiny, raster_chunk_processing.py!

Edit, May 2019:
That feeling when you think you've got something working, only to discover that its not doing what you think it's doing. rcp's skymodeling algorithm has been updated to properly compute shadows. Updated examples will be forthcoming.


Ok, so I need to work on my naming skills.



What Makes RCP Special?
RCP applies any one of several raster processing methods to arbitrarily-large rasters by automatically breaking up the raster into smaller chunks. This a) avoids out-of-memory errors with large rasters, 32-bit programs, and/or 8 GB or 16 GB memory loadouts in normal desktop computers and b) facilitates parallel processing when feasible.

What makes RCP better than re-tiling a large raster and processing the chunks individually? Many of these algorithms modify or create a new value for each pixel based on the pixels around them. When you run these on individual tiles and then merge them, the pixels on the edges of the tiles don't have all the needed info and produce a noticeable line in the final image. You also spend time and storage space tiling and merging.

RCP avoids these downsides by dividing the raster into chunks of a user-defined size, adding a few pixels (again, a user-defined amount) to each side of the chunk to avoid the aforementioned edge problems, then reading only this chunk into memory and working on it, and finally writing it into it's proper location within the output file.

RCP relies on GDAL for all the file access heavy lifting, and I owe a huge shout-out to Frank Warmerdam, Even Rouault, and the rest of the GDAL team.

Processing Methods
Currently, RCP can do several really cool things. The first is a selection of DEM smoothers (blur_mean, blur_gauss, blur_toews, and mdenoise). Next, and really, really fun, is Kennelly and Stewart's skymodel hillshade (along with a bog-standard hillshade algorithm copied shamelessly from the GDAL source). It will also calculate Topographic Position Index (TPI) with a user defined window, which is a step up on the default gdaldem TPI which uses a fixed window size. Finally, it provides a really esoteric contrast stretcher called Contrast Limited Adaptive Histogram Equalization (CLAHE).

DEM Smoothing
Often, a DEM comes to us with some imperfections. Maybe that new half-foot Lidar is too detailed. Maybe you're tired of those contour-line-esque artifacts in the NED 1/3 arc-second DEMs. Maybe you don't want to show so much detail, but want to maintain the crisp mountain ridges. This is where smoothing helps: in preparation for hillshading or some other cartographic, non-analytical process.

blur_mean smoothes the DEM by taking the average of all the pixels within a user-dfined radius/diameter circle around each pixel. This is similar to the ArcGIS Focal Statistics (Mean) tool with a circular neighborhood.

blur_gauss is similar to blur_mean, but it uses a 2d Gaussian distribution kernel for it's average, providing more detailed results than a regular blur_mean. You could do this with a custom neighborhood in Focal Statistics, but RCP makes it trivial to change the neighborhood size.

blur_toews is a smoothing kernel provided by Mike T. at https://gis.stackexchange.com/a/10467. It is slightly different from blur_mean and blur_gauss and may give better results depending on the terrain.

mdenoise uses the mdenoise.exe program that implements Sun et al (2007)'s Mesh Denoise algorithm, again as inspired by Mike T. at https://gis.stackexchange.com/a/12856. The compiled .exe is 32 bit and can only handle small rasters before blowing up and running out of memory. By limiting the size of the chunk we're working on, RCP allows us to use this executable as-is (along with automating the tif->asc->tif conversion process) and to parallelize the process.

Hillshades
My first impetus for this script was Kennelly and Stewart's skymodel hillshade technique, as featured on the cover of the 2016 ESRI Map Book. More optimization work needs to be done, but my ultimate goal is to get this working much faster than ESRI's Terrain Tools script tool through parallelization (and maybe offloading to the GPU).

skymodel is the aforementioned implementation of the skymodel technique, using Dr. Stewart's SkyLum.exe to generate a list of different hillshade azimuths and altitudes that are then combined into the final image.

hillshade is the bog-standard hillshading algorithm lifted from the gdaldem code. This is mainly used to generate all the different hillshades for the skymodel function.

Other Utilities
There are two other utilities I've included as I've played around with different raster techniques.

TPI is the normal Topographic Position Index calculation of the pixel value minus the mean of the surrounding pixels, which can then be categorized and displayed as desired. Coincidentally, this is also a high-pass version of the mean filter (http://fourier.eng.hmc.edu/e101/lectures/Fourier_Analysis/node10.htmlhttp://northstar-www.dartmouth.edu/doc/idl/html_6.2/Filtering_an_Imagehvr.html). Hillshading the results of TPI can be rather interesting, depending on the TPI kernel width.

CLAHE is some really cool image magic that increase the contrast of your image in neat ways (https://imagej.net/Enhance_Local_Contrast_(CLAHE)https://en.wikipedia.org/wiki/Adaptive_histogram_equalization#Contrast_Limited_AHE) Yeah, I don't quite understand all the implications myself... but it can look cool! I've used it to increase the contrast of hillshades that I've then multiplied into aerial imagery to "bumpmap" the imagery. Increasing the contrast allows me to darken the shadows without increasing the darkness of the grey midtones.

WARNING: CLAHE relies on knowing the historgam of the entire image, so running it with chunks can cause visibly different stretches for each chunk. If you make your chunks large enough that the histograms are roughly the same, this difference nearly disappears.

For example, if one chunk covers mainly valley floors (where there are few hills and thus few shadows) and another chunk covers mountains (with lots of whites and lots of darks), they're going to have different histograms and thus different stretches. However, if your chunks are big enough that they cover both valley and mountains, the histograms should be similar enough to get away with.

Check the next posts for how to use it and semi-cartographic examples!

PS: Any feedback on RCP is welcome!