Friday, October 19, 2018

raster_chunk_processing.py Installation and Usage

This is part 2 of a three-part post. Read part 1 (Introduction) here and part 3 (Example Output) here.

Installation

Git Repo
RCP lives in the https://github.com/jacobdadams/rcp git repo, along with some other playground scripts and the environment.yml file for easily recreating the conda environment.

Prerequisites
RCP is written in Python 3 and relies on the numpy, astropy, scikit-image, numba, and gdal libraries. The recommended way of installing these packages is from the conda-forge channel using the conda package management system.

After you've installed conda, use the environment.yml file in the git repo to automatically create the rcp conda environment:

(NOTE: The environment.yml file currently does not include the numba packages. It will be updated later)
conda env create -f environment.yml

Or, just install the needed packages thus:
conda install numpy numba astropy scikit-image gdal

The package solver will take a while to get this set up. This environment is configured to avoid problems with the default channel's GDAL not supporting bigtiffs, an issue with the solver not handling the conda-forge channel properly (see https://github.com/conda-forge/gdal-feedstock/issues/219), and changes to numpy that are throwing warnings in astropy (see https://github.com/astropy/astropy/issues/7816). Just using the normal gdal installation from conda-forge seems to work now.

See my earlier post on conda for a brief intro if you're new to this world.

It may be possible to run RCP in Python 2.7, but this has not been tested. Various small tweaks would probably be required to get it running.

The mdenoise option requires Sun et al's mdenoise executable from the author's webpage. Download the appropriate version, and note it's location—you'll need to set a variable to this path.

The skymodel option requires an illumination file, which can be generated by the skyLum.exe program written by Dr. James Stewart. This can be found in the skyModels.zip archive on the repo listed below. Thank you to Dr. Stewart for allowing me to redistribute this program. skyLum.exe is only compiled for Windows.

Installing

  1. Clone the git repo: https://github.com/jacobdadams/rcp
  2. Change the mdenoise_path variable to point to your downloaded mdenoise executable (it's towards the end of the script).
  3. Create the conda environment (if you haven't done so already): conda env create -f environment.yml
    1. Or, install the needed packages in an existing environment: conda install numpy numba astropy scikit-image gdal

Usage
python raster_chunk_processing.py -m [method] {general options} {method-specific options} input_file output_file

RCP is run by your python interpreter and takes two types of options: general and method-specific. General options control the chunk size and processing, and they are required for most methods (they usually won't choose defaults if you don't specify anything). The method-specific options control the parameters for each

For a brief overview of all the options, run python raster_chunk_processing.py -h.

General Options
-m The processing method (blur_mean, blur_gauss, blur_toews, mdenoise, hillshade, skymodel, clahe, TPI).
-s The size of the chunks that the raster will be divided into for processing. mdenoise seems to fail on chunks larger than 1500, but for other methods it depends on how much memory you have and (for small rasters) how many cores you want to use at a single time.
-o The number of pixels to be read beyond each edge of the calculated chunk to avoid edge artifacts. I find something like 20 or 25 is sufficient. Some methods overwrite this setting to ensure they have good data.
-p Number of concurrent processes to run. Values larger than 1 will allow parallel processing if you have the cores available. I usually set this to n-1, where n is the number of cores I have. This is the one option with a default (1).
--verbose Toggles verbosity on. By default, RCP reports when each chunk was started, it's index (ie, 1 of 220), and the percentage of chunks already started/finished (ie, 25%; this gets a little gray because when running parallel a previously-ordered chunk in a separate stream may not be completed before this message is printed). With verbosity enabled, it reports the value of all the options passed to it, and for each chunk it adds the pixel indices of the chunk (including overlaps) and the process ID. The output from the mdenoise executable will only be shown if the verbose flag is set.

Kernel Size Option
-r Radius (in pixels) to use for blur_mean, blur_toews, blur_gauss, and TPI. This is the main parameter to control the blurring effect. Play around with this on a subset of your data to get a feel for the best look.

blur_gauss Option
-d Standard deviation of the Gaussian distribution (sigma). Start with 1, and experiment from there. A large sigma for a given radius (say, greater than 1/2 the radius value) effectively turns it into a mean filter.

mdenoise Options
-n Iterations for normal updating. Start with something around 10 and play around from there. Large values of -n and -v will increase the runtime for each chunk.
-t Threshold value, between 0 and 1. Try starting with 0.6.
-v Iterations for vertex updating. Start with 20 and play around.

CLAHE Options
-c Clip limit, between 0 and 1. I usually use really small limits, like 0.01 or 0.05; 0.1 is usually as high as I'll go.
-k Kernel size for clahe. Start with 30.

Hillshading Options
-az Azimuth of the light source (defaults to the standard of 315 degrees).
-alt Altitude of the light source (defaults to the standard of 45 degrees).

Skymodel Options
-l Path to the luminance file generated by SkyLum.exe. The header lines must be deleted so that every line in the file is a az/elevation/weight tuple.

File Options
input_file The input DEM. This must currently be a single-band file readable and createable by GDAL.
output_file The output file that will be created. A full-size but blank file is created at the beginning of the program and is then opened and updated each time a chunk is finished. The source file is used to determine the file type, projection, extent, and cell size of the output file (unless it's a VRT, in which case it defaults to a GeoTiff). If a GeoTiff is used, the output file is compressed with LZW, tiled, and bigtiff-enabled (-co compress=lzw, -co tiled=yes, -co bigtiff=yes). If you are using another format, you can edit the program to support different options (search for lzw_opts in the ParallelRCP() function).

Usage Examples
Ok, that was a lot of explanation. Here are some examples in action:

Gaussian Blur
python raster_chunk_processing.py -m blur_gauss -s 2000 -o 50 -p 3 -r 10 -d 1 dem_state.tif dem_state_gauss10.tif
(Took about a minute to process a 10k pixel by 10k pixel DEM).

mdenoise
python raster_chunk_processing.py -m mdenoise -s 1500 -o 50 -p 3 -n 20 -t 0.6 -v 30 dem_state.tif dem_state_md206030.tif
(Took about 8 minutes on the 10k by 10k DEM).

skymodel
python raster_chunk_processing.py -m skymodel -s 2000 -o 600 -p 3 -l 1_35_215_150.csv dem_state.tif dem_state_skymodel.tif