Abstract
Integrating R with Geographic Information Systems (GIS) extends R’s statistical capabilities with numerous geoprocessing and data handling tools available in a GIS. QGIS is one of the most popular open-source GIS, and it furthermore integrates other GIS programs such as the System for Automated Geoscientific Analyses (SAGA) GIS and the Geographic Resources Analysis Support System (GRASS) GIS within a single software environment. This and its QGIS Python API makes it a perfect candidate for console-based geoprocessing. By establishing an interface, the R package RQGIS makes it possible to use QGIS as a geoprocessing workhorse from within R. Compared to other packages building a bridge to GIS (e.g., rgrass7, RSAGA, RPyGeo), RQGIS offers a wider range of geoalgorithms, and is often easier to use due to various convenience functions. Finally, RQGIS supports the seamless integration of Python code using reticulate from within R for improved extendability.Defining a GIS as a system for the analysis, manipulation and visualization of geographical data (Longley et al. 2011), one could argue that R has become a GIS (R. Bivand, Pebesma, and Gómez-Rubio 2013). In great part this is thanks to packages that provide spatial classes and algorithms coded in and for R (despite this these packages might also link to other software outside of R). These include maptools (Roger Bivand and Lewin-Koh 2017), raster (Hijmans 2017), sp (R. Bivand, Pebesma, and Gómez-Rubio 2013) and sf (Pebesma 2017). Further packages even extend R’s GIS capabilities through advanced mapping, e.g., mapview (Appelhans et al. 2017) and mapmisc (Brown 2016), and routing, e.g., osmar (Eugster and Schlesinger 2013) and dodgr (Padgham and Peutschnig 2017), among others. Despite this, native R (in the sense of coded in and for R) lacks fundamental GIS capabilities. GIS topology and topological operations are only partially (RArcInfo, Gómez-Rubio and López-Quílez 2005) or indirectly available via rgrass7 (Roger Bivand 2017). Furthermore, R is neither a spatial database management system nor especially good at the manipulation of large data sets (Ripley, Bates, and DebRoy 2016). Hence, computationally demanding GIS operations (point cloud processing, overlay operations on ‘big’ spatial data) executed in R may be rather slow. Performance and scalability, of course, depend on the computer hardware, and cloud computing may eventually alleviate or even settle this problem. Yet, most R users most likely still work on a local machine. What is more, R is lacking a number of fundamental GIS operations such as the derivation of various terrain attributes from a digital elevation model (DEM). And the same is true for 3D data visualization and voxel processing (Tomislav Hengl et al. 2015). Finally, though interactive tasks such as digitizing of geodata have become possible within R very recently (mapedit, Appelhans and Russell 2017), extensive manual editing is better done with the help of a GIS.
Many of R’s geospatial shortcomings could potentially be addressed
through R programming directly. However, R was designed from the very
beginning as an interactive interface to the algorithms of other
software (Chambers 2016). Hence, it is
unnecessary and even counterproductive to duplicate the functionality
provided by an existing dedicated software with an expert developer and
user community as long as there is a way to access it from within R.
Therefore, it is barely surprising that numerous R packages provide
access to third-party geoprocessing tools (Figure 1), only some of which will be discussed
here. rgdal (Roger Bivand, Keitt, and Rowlingson 2017)
accesses the geospatial data abstraction library (GDAL/OGR) (GDAL Development Team 2017). rgeos (Roger Bivand and Rundel 2017) is an interface
to geometry engine - open source (GEOS, GEOS
Development Team 2017), which opens the way to GIS vector
operations. However, GEOS performance is somewhat limited. Think, for
instance, of the spatial union of all US American census tracts and
postal code layers, and it may be quite possible that
rgeos::gUnion may take a very long time. The successor of
the sp package, package sf combines the functionality
of sp (spatial classes), rgdal (here: import/export of
spatial vector data) and rgeos (geometrical operations) in just
one package. Note also that GEOS is a C API for topology operations on
geometries. Consequently, it expects topologically correct data. To make
sure that our geodata lives up to topological expectations in general,
our best approach is probably through another third-party integration,
namely R-GRASS (R. Bivand 2007; Roger Bivand
2017). Additionally, GRASS GIS comprises a large suite of vector
and raster functions. Basically, the user has to set up a spatial
database before being able to use GRASS’s geoprocessing utilities (Neteler and Mitasova 2008). Hence, less
experienced GIS users will likely prefer faster-to-use GIS interfaces
also providing extensive geoprocessing capabilities. In particular, RSAGA (Alexander Brenning, Koszinski, and Sommer 2008)
integrates R with SAGA (Conrad et al.
2015) and RPyGeo
(Alexander Brenning 2012a) provides an
interface to ArcGIS (ESRI 2017), which is
probably still the most popular GIS environment in the world with >1
million users and the greatest market share among proprietary GIS (Longley et al. 2011).
What has been missing, however, is an R interface to one of the most widely used open-source GIS, QGIS (QGIS Development Team 2017; Graser and Olaya 2015). So far, the QGIS processing toolbox provided only the opposite interface by letting the user integrate R scripts as a user-defined ‘tool’ in QGIS. This is fine for people unwilling to use R directly. However, interfacing from R to QGIS has multiple benefits to the R user community. First and foremost, native QGIS geoalgorithms are now available from within R for the first time. Moreover, it is a special feature of QGIS that it acts as an umbrella integrating various other GIS power houses under its hood. These include SAGA, GRASS, GDAL, the Orfeo Toolbox (Inglada and Christophe 2009), TauDEM (Tarboton and Mohammed 2017) and additional tools for light detection and ranging (LIDAR) data (Rapidlasso 2017). RQGIS (Muenchow and Schratz 2017) brings this incredibly powerful geoprocessing environment to the R console in just one package. This, however, does not mean that specialized packages such as RSAGA and rgrass7 (R. Bivand 2007) will become obsolete, as discussed later. RQGIS also aims to be user-friendly by automatically retrieving GIS function parameter names and corresponding default values as well as supporting R named arguments for geoalgorithm parameters through the ellipsis argument.
In general, R–GIS interfaces open the way to extremely powerful and innovative statistical geoprocessing as for example shown by Alexander Brenning (2008), T. Hengl, Heuvelink, and van Loon (2010), Muenchow, Brenning, and Richter (2012), Vanselow and Samimi (2014), A. Brenning et al. (2015) M. Mergili, Krenn, and Chu (2015), Martin Mergili and Kerschner (2015), Poggio and Gimona (2015) and Zandler, Brenning, and Samimi (2015). In this paper we will first introduce the general architecture and main features of the RQGIS package. We will then demonstrate the application of this integrated scientific programming approach with an ecological example. Subsequently, we will show how to easily complement and extend RQGIS with Python programming, especially PyQGIS (Sherman 2014). In our discussion, we will finally compare and contrast RQGIS with other approaches to R–GIS integration, and provide an outlook and motivation for future developments.
The RQGIS package utilizes the QGIS Python API in order to access QGIS modules. To successfully run the QGIS Python API, RQGIS first sets up all required environment variables (Figure 2). And secondly, it establishes a tunnel to Python using reticulate (Allaire, Tang, and Geelnard 2017) - a package providing an R interface to Python. The older package rPython (Bellosta 2015) is similar to reticulate, however, it is only available for Unix-based systems which is why we had to dismiss it as an option for RQGIS. With reticulate, we set up the Python environment only once, and use the resulting tunnel to exchange functions and objects between R and Python seamlessly.
We can divide RQGIS roughly into two major components:
The Python code (python_funs.py located in
inst/python of RQGIS) defines a Python class named
"RQGIS" with methods to be called during the geoprocessing.
Defining an own class has the additional benefit that it becomes highly
unlikely that (advanced) users interacting with the QGIS Python API
accidentally overwrite some of our predefined methods.
The processing.R file (found in the R
folder of RQGIS) actually establishes the QGIS Python
interface, and lets the user run QGIS from within R. The most important
functions are (see also 2.2 for a detailed
description):
open_app() to establish a tunnel to Python and a
QGIS custom application
find_algorithms() to retrieve the QGIS command-line
names for all available geoalgorithms
open_help() and get_args_man() to
access help resources as well as function arguments and default
values
run_qgis() to call QGIS geoalgorithms from within
R
The most notable features of RQGIS are:
For the first time, native QGIS algorithms are available from within
Additionally, RQGIS provides access to hundreds of third party geoalgorithms including GDAL, GRASS GIS and SAGA GIS. In the future many more integrations can be expected. For instance, there is already a plugin providing access to PostGIS geoprocessing tools (clip, dissolve, distance, etc.) available in the QGIS processing toolbox (https://plugins.qgis.org/plugins/postgis_geoprocessing/).
R users can stay in their preferred programming language without having to touch Python.
Convenient access to QGIS help resources facilitates the
geoprocessing work flow. While open_help accesses the QGIS
online help for a specific geoalgorithm, get_args_man()
retrieves function arguments and their default values.
run_qgis() also accepts "sf",
"sp" and "raster" objects as arguments.
Similarly, users may directly load the QGIS output into R by setting
load_output to TRUE when using
run_qgis().
Since RQGIS is an interface to various GIS software packages, the user needs to install this software beforehand. To facilitate the installation process we have written an installation guide, see http://jannes-m.github.io/RQGIS/articles/install_guide.html. Or after having installed the package, one can also access the corresponding vignette by typing:
vignette("install_guide", package = "RQGIS")
We will demonstrate the usage of RQGIS by showing how to
compute the plan and tangential curvatures of a digital elevation model
(DEM). The first thing to do is to make sure, that all paths are set
correctly to successfully run the Python API from within R. Function
set_env() facilitates this since the user only needs to
specify the root path to the QGIS installation. If the root path remains
unspecified, set_env() tries to be smart by checking the
default QGIS installation directories. If this is unsuccessful,
set_env() will try to find the QGIS installation on the
computer which may be time-consuming especially on Windows machines. A
much faster way is to explicitly indicate the root path. For Windows
this might look like this
qgis_env <- set_env(root = ’C:/OSGEO4~1’).
Subsequently, set_env() finds all required paths.
Virtually all subsequent RQGIS functions require the output
list of set_env(). This is why, RQGIS
automatically caches the output of set_env(), and reuses it
when required by another function later on. To establish a tunnel to the
QGIS Python API, we run open_app(). Explicitly, the
function sets all necessary paths (e.g., path to the QGIS Python binary)
to successfully run QGIS, and secondly opens a QGIS custom application
(i.e., outside of the QGIS GUI interface) while importing necessary
Python modules.
library("RQGIS")
set_env()
open_app()
Running set_env() and open_app() is
optional here since all subsequent functions dependent on their output
will run them automatically in case they have not been executed before.
To work in a reproducible manner, and to find out which QGIS and
third-party GIS versions we are using, we execute:
## $root
## [1] "C:/OSGeo4W64"
##
## $qgis_prefix_path
## [1] "C:/OSGeo4W64/apps/qgis"
##
## $python_plugins
## [1] "C:/OSGeo4W64/apps/qgis/python/plugins"
info_r <- version
info_qgis <- qgis_session_info()
c(platform = info_r$platform, R = info_r$version.string, info_qgis)
## $platform
## [1] "x86_64-w64-mingw32"
##
## $R
## [1] "R version 3.4.2 (2017-09-28)"
##
## $qgis_version
## [1] "2.18.14"
##
## $gdal
## [1] "2.2.2"
##
## $grass6
## [1] "6.4.3"
##
## $grass7
## [1] "7.2.2"
##
## $saga
## [1] "2.3.2"
Continuing with our analysis, we need to find out the command-line
name of a geoalgorithm available in QGIS that computes the curvatures
from a DEM. find_algorithms() lets the user use regular
expressions to search for a function which contains the search terms in
its short description. Leaving the search_term-argument
empty, will return all available geoalgorithms. Here, we assume that the
function we are looking for contains the word ‘curvature’ in its short
description. Setting name_only to TRUE gives
back the name of the geoalgorithm instead of its name plus the
corresponding short description.
find_algorithms(search_term = "curvature",
name_only = TRUE)
## [1] "grass7:r.slope.aspect" "saga:curvatureclassification"
## [3] "saga:slopeaspectcurvature" "saga:upslopeanddownslopecurvature"
## [5] "grass:r.slope.aspect"
Several functions are available for our task, we will go on with
function grass7:r.slope.aspect. To familiarize ourselves
with the function, we can access its online help by calling (not
shown):
open_help(alg = "grass7:r.slope.aspect")
Next, we would like to know how to use a specific geoalgorithm.
get_usage() prints the parameters and default values for a
given geoalgorithm to the console.
get_usage(alg = "grass7:r.slope.aspect")
## ALGORITHM: r.slope.aspect - Generates raster layers of slope
## aspect
## curvatures and partial derivatives from a elevation raster layer.
## elevation <ParameterRaster>
## format <ParameterSelection>
## precision <ParameterSelection>
## -a <ParameterBoolean>
## zscale <ParameterNumber>
## min_slope <ParameterNumber>
## GRASS_REGION_PARAMETER <ParameterExtent>
## GRASS_REGION_CELLSIZE_PARAMETER <ParameterNumber>
## slope <OutputRaster>
## aspect <OutputRaster>
## pcurvature <OutputRaster>
## tcurvature <OutputRaster>
## dx <OutputRaster>
## dy <OutputRaster>
## dxx <OutputRaster>
## dyy <OutputRaster>
## dxy <OutputRaster>
##
##
## format(Format for reporting the slope)
## 0 - degrees
## 1 - percent
## precision(Type of output aspect and slope layer)
## 0 - FCELL
## 1 - CELL
## 2 - DCELL
get_args_man() lets us retrieve automatically a
corresponding parameter-argument list. Setting the options
parameter to TRUE (the default) automatically chooses the
default value from a list of possible options for a parameter. This is
always the first option which is in accordance with the QGIS GUI
behavior. To make the user aware of the automatically chosen options,
get_args_man() prints the corresponding values to the
console.
params <- get_args_man(alg = "grass7:r.slope.aspect", options = TRUE)
## Choosing default values for following parameters:
## format: 0
## precision: 0
## See get_options('grass7:r.slope.aspect') for all available options.
grass7:r.slope.aspect has 17 parameters. Here, we only
show the first six parameters for brevity.
head(params)
## $elevation
## [1] "None"
##
## $format
## [1] "0"
##
## $precision
## [1] "0"
##
## $`-a`
## [1] "True"
##
## $zscale
## [1] "1.0"
##
## $min_slope
## [1] "0.0"
Next, we specify the required arguments. Of course,
grass7:r.slope.aspect expects a spatial input file, here a
DEM. Conveniently, run_qgis() accepts as input both a path
to a spatial file stored on disk or a spatial object residing in R’s
environment (specifically "raster"-, "sp"- and
"sf"-objects). Here, we use a DEM that comes as an example
file with the RQGIS package. Note that run_qgis()
simply saves spatial input files to a temporary output location. Hence,
if the file already exists on disk, it is much more efficient to
indicate the path to the file instead of loading it into R and letting
run_qgis() export it again. By contrast, indicating output
paths is not strictly necessary. If an output path parameter equals
None (QGIS default), QGIS automatically creates an output
file which it saves to a temporary processing output folder. In the code
chunk below, we specifically indicate curvature outputs while keeping
the default, i.e. None, for the remaining output parameters
slope, aspect, dx,
dy, dxx, dyy and dxy
(check out the GRASS function documentation for more information,
i.e. run open_help("grass7:r.slope.aspect")).
run_qgis() prints all output paths to the console if
show_output_paths is set to TRUE. Here, we turn this
behavior off for two reasons. First, we are not interested in the output
paths of the seven terrain attributes we left unspecified. Secondly, we
have explicitly specified the curvature output paths, i.e., we already
know the corresponding output locations. We recommend to specify those
output paths which are relevant to the analysis. Manual specification
has the additional benefit that we can indicate a specific file format
(QGIS default for raster data sets is in most cases .tif, but we might
want to use e.g., .asc or SAGA’s .sdat). Additionally, loading QGIS
output directly back into R (load_output-parameter) only
works with output paths specified by the user.
data("dem", package = "RQGIS")
out <- run_qgis(alg = "grass7:r.slope.aspect",
elevation = dem,
pcurvature = file.path(tempdir(), "pcurv.tif"),
tcurvature = file.path(tempdir(), "tcurv.tif"),
show_output_paths = FALSE,
load_output = TRUE)
Note that we used R named arguments in run_qgis(), i.e.,
we assigned values or objects to the parameters whose names we have
identified with the help of get_args_man() or
get_usage(). We could replace the R named arguments also by
a parameter-argument list. Remember that we have already created a
parameter-argument list named params using
get_args_man() (see above):
params$elevation <- dem
params$pcurvature <- file.path(tempdir(), "pcurv.tif")
params$tcurvature <- file.path(tempdir(), "tcurv.tif")
out <- run_qgis(alg = "grass7:r.slope.aspect",
params = params,
load_output = TRUE,
show_output_paths = FALSE)
class(out)
## [1] "list"
names(out)
## [1] "pcurvature" "tcurvature"
However, providing R named arguments and a parameter-argument list is
not possible, and run_qgis() will complain telling the user
to user either one of these but not a mixture. Since we have set
load_output to TRUE, run_qgis()
automatically loads the QGIS output into R. In this case, the object
out is a list with two "raster" objects. If we
only had specified one output raster, out would have been a
"RasterLayer" object. In case the output is a vector layer,
run_qgis() will load it as an "sf" object. To
have a look at the output, we can execute following code (not
shown):
library("raster")
plot(stack(out))
Concerning the handling of parameter-argument pairs,
run_qgis() uses get_args_man() through
pass_args() in the background to access the default values
of all missing arguments if available. If the user accidentally omits a
required argument, run_qgis() will return an error message
that informs about the missing argument. The help documentation of
pass_args() presents a detailed list of argument checks
that are run before executing run_qgis().
Experienced GRASS users may wonder if there is a need to specify the
GRASS_REGION_PARAMETER. pass_args() determines
this parameter automatically based on the spatial layers provided as
input by the user (in our example above this is dem).
However, the GRASS_REGION_PARAMETER can also be set
manually (see the pass_args() documentation for
details).
To show the utility of RQGIS in real-world applications, we combine QGIS functionality with R’s modeling and (geo-)statistical capabilities in an ecological study in the coastal desert of northern Peru (Figure 3). Despite the extreme aridity of the Mount Mongón region (200-1100 m asl), this area is the habitat of a distinct flora and fauna (Dillon, Nakazawa, and Leiva 2003). The unique vegetation, locally termed lomas, mainly survives due to heavy fog during the austral winter months (Muenchow, Feilhauer, et al. 2013; Muenchow, Hauenstein, et al. 2013).
Linking species richness to environmental predictors along gradients is a key topic of community ecology and biogeography (Muenchow et al. 2017), and the fundamental basis for conservation planning (Pomara et al. 2012). In our use case, we model vascular plant species richness along an altitudinal gradient as a function of topographic and remotely sensed variables by means of count regression. In the following, we will show a simplified version of an analysis by Muenchow, Feilhauer, et al. (2013).
Before running a Poisson model, we need to compute terrain attributes from a DEM. These will serve as predictors to model species richness. To account for the unimodal relationship between elevation and species richness, we use a second-order orthogonal polynomial function (Figure 4, Panel Elevation). In the original paper, we dropped the least significant variables one at a time until only significant predictors remained (elevation and its squared term, catchment slope, catchment area and the normalized difference vegetation index, NDVI). Due to space constraints and demonstration purposes, we will simply use the final model here (for details see Muenchow, Feilhauer, et al. 2013). Instead of calculating all predictors used in the original paper, we only show how to derive selected geospatial predictors using RQGIS, namely tangential and profile curvature, catchment slope and catchment area.
The numerical representation as well as the analysis of the land surface is frequently referred to as terrain analysis and terrain modeling. The corresponding surface-characterizing measures are known as terrain attributes (Pike, Evans, and Hengl 2008). Terrain attributes play an important role, for example, in pedometrics (McBratney et al. 2000), precision agriculture (Kühn et al. 2009), geomorphometry (Pike, Evans, and Hengl 2008) and ecology (Muenchow, Bräuning, et al. 2013). They are frequently related to slope stability (Montgomery and Dietrich 1994; Muenchow, Brenning, and Richter 2012). Additionally, they are proxies for variables representing water availability such as soil moisture, soil texture or moisture-holding capacity, among others (Alexander Brenning 2008; Franklin, McCullough, and Gray 2000; Muenchow, Hauenstein, et al. 2013). Especially the latter is of utmost importance regarding plant distribution in a desert environment. While GIS can easily calculate terrain attributes, R is rather limited in this respect. However, without terrain attributes, we would neither be able to model nor predict species richness appropriately.
First we would like to use SAGA to remove local depressions from the
DEM, since these may be artifacts. For this, we use the
[1] Fill Sinks method. Note that you also may use numbers
for specifying the option, here, the fill sinks method corresponds to
1.
get_usage("saga:sinkremoval")
## ALGORITHM: Sink removal
## DEM <ParameterRaster>
## SINKROUTE <ParameterRaster>
## METHOD <ParameterSelection>
## THRESHOLD <ParameterBoolean>
## THRSHEIGHT <ParameterNumber>
## _RESAMPLING <ParameterSelection>
## DEM_PREPROC <OutputRaster>
##
##
## METHOD(Method)
## 0 - [0] Deepen Drainage Routes
## 1 - [1] Fill Sinks
## _RESAMPLING(Resampling method)
## 0 - Nearest Neighbour
## 1 - Bilinear Interpolation
## 2 - Bicubic Spline Interpolation
## 3 - B-Spline Interpolation
run_qgis(alg = "saga:sinkremoval",
DEM = dem,
METHOD = "[1] Fill Sinks",
DEM_PREPROC = file.path(tempdir(), "sdem.sdat"),
show_output_paths = FALSE)
Next, we compute the catchment area (or upslope contributing area)
and its mean slope from the preprocessed DEM. These raster datasets are
calculated while calculating the ‘Saga wetness index’, which is also
frequently used as a predictor in ecological studies. To calculate the
catchment slope instead of the local slope we set the
SLOPE_TYPE argument to 1. The names of the output rasters
are carea.sdat and cslope.sdat both of which
will be stored in tempdir().
get_usage("saga:sagawetnessindex")
## ALGORITHM: Saga wetness index
## DEM <ParameterRaster>
## SUCTION <ParameterNumber>
## AREA_TYPE <ParameterSelection>
## SLOPE_TYPE <ParameterSelection>
## SLOPE_MIN <ParameterNumber>
## SLOPE_OFF <ParameterNumber>
## SLOPE_WEIGHT <ParameterNumber>
## _RESAMPLING <ParameterSelection>
## AREA <OutputRaster>
## SLOPE <OutputRaster>
## AREA_MOD <OutputRaster>
## TWI <OutputRaster>
##
##
## AREA_TYPE(Type of Area)
## 0 - [0] absolute catchment area
## 1 - [1] square root of catchment area
## 2 - [2] specific catchment area
## SLOPE_TYPE(Type of Slope)
## 0 - [0] local slope
## 1 - [1] catchment slope
## _RESAMPLING(Resampling method)
## 0 - Nearest Neighbour
## 1 - Bilinear Interpolation
## 2 - Bicubic Spline Interpolation
## 3 - B-Spline Interpolation
run_qgis(alg = "saga:sagawetnessindex",
DEM = file.path(tempdir(), "sdem.sdat"),
SLOPE_TYPE = 1,
SLOPE = file.path(tempdir(), "cslope.sdat"),
AREA = file.path(tempdir(), "carea.sdat"),
show_output_paths = FALSE)
To have a look at the output rasters, we can run (not shown):
library("dplyr")
library("raster")
file.path(tempdir(), c("cslope.sdat", "carea.sdat")) %>%
raster::stack(.) %>%
plot
We furthermore need to apply some transformations to these raster files for improved interpretability. On the one hand, we convert slope angle from radians to degrees.
library("raster")
cslope <- raster(file.path(tempdir(), "cslope.sdat"))
cslope <- cslope * 180 /pi
On the other hand, we divide the catchment area variable by one million to change the unit from m2 to km2. Furthermore, we transform it logarithmically since it is strongly skewed to the right.
carea <- raster(file.path(tempdir(), "carea.sdat"))
log_carea <- log(carea / 1e+06)
Apart from the catchment area and catchment slope, our final model
requires the NDVI as an indicator of vegetation properties. To calculate
it, we would have to perform pixel-by-pixel arithmetic operations on
raster data sets representing the Landsat satellite’s spectral bands
three (red) and four (near infrared). These so-called map algebra
operations are available through raster and the QGIS modules
saga:rastercalculator and
grass:r.mapcalculator; however, the RQGIS package
already provides the NDVI raster as an example dataset. Therefore, we
merely have to attach the data to our workspace.
data("ndvi", package = "RQGIS")
To account for the nonlinear unimodal relationship between elevation
and species richness, we need two rasters representing the second-order
orthogonal polynomials of the original DEM. First, we convert the
elevation unit from m to km by dividing the original DEM raster by 1000.
Next, we use R’s built-in poly() function to calculate the
orthogonal polynomials for each pixel in our DEM raster to avoid
collinearity among the predictors. Before saving the resulting rasters
and the catchment slope, the catchment area and the NDVI to a temporary
output location, we apply the crop() function from the
raster package to ensure that all rasters share the same
extent.
data("dem", package = "RQGIS")
dem <- dem / 1000
my_poly <- poly(values(dem), degree = 2)
dem1 <- dem2 <- dem
values(dem1) <- my_poly[, 1]
values(dem2) <- my_poly[, 2]
for (i in c("dem1", "dem2", "log_carea", "cslope", "ndvi")) {
tmp <- crop(get(i), dem)
writeRaster(x = tmp,
filename = file.path(tempdir(), paste0(i, ".asc")),
format = "ascii",
prj = TRUE,
overwrite = TRUE)
}
After creating these raster datasets, we would like to extract their
attributes to the randomly sampled points. Conveniently, RSAGA’s
pick.from.ascii.grids() function accepts multiple rasters
for parallel attribute extraction. Note that
pick.from.ascii.grids() is a pure R function, i.e., it runs
without accessing SAGA. Here, we only extract the predictor variables
needed in the final model (see above). The "sf" object
random_points refers to randomly sampled points we visited
in the field. Unfortunately, pick.from.ascii.grids() does
not accept spatial point objects as input; instead we have to provide it
with a "data.frame" and indicate which columns refer to the
coordinates. In the file argument we specify the raster
files from which we would like to extract values to our points. The
output columns in vals will be named like the input
rasters.
library("dplyr")
data("random_points", package = "RQGIS")
random_points[, c("x", "y")] <- sf::st_coordinates(random_points)
raster_names <- c("dem1", "dem2", "log_carea", "cslope", "ndvi")
vals <- RSAGA::pick.from.ascii.grids(data = as.data.frame(random_points),
X.name = "x",
Y.name = "y",
file = file.path(tempdir(), raster_names),
varname = raster_names)
dplyr::select(vals, -geometry) %>%
head(., 3)
## id spri x y dem1 dem2 log_carea cslope ndvi
## 1 1 4 797179 8932755 -0.01049 0.01268 -1.21800 21.18 -0.3603
## 2 2 4 796749 8932621 -0.01019 0.01163 0.04145 13.02 -0.3488
## 3 3 3 796816 8932739 -0.01008 0.01124 -0.48148 23.70 -0.3396
To model species richness, which is a count variable, we naturally opt for a Poisson regression. Overall, spatial count regression models are popular across many fields including epidemiology (Fernandez et al. 2012), demography (Chien, Guo, and Zhang 2016), criminology (Jones-Webb and Wall 2008), remote sensing (Comber et al. 2016) and ecology (Moreno-Fernández et al. 2015). Since we observed a unimodal nonlinear relationship between species richness and elevation, elevation enters the model as a second-order polynomial function.
fit <- glm(formula = spri ~ dem1 + dem2 + cslope + ndvi + log_carea,
data = vals,
family = "poisson")
To spatially predict species richness (Figure 5), we apply the estimated beta coefficients to
the input rasters. raster’s predict function does
this by accepting the fitted model and the predictor rasters as input.
Finally, the crop-function ensures that the predictions are
restricted to the study area.
library("sf")
library("raster")
raster_names <- c("dem1.asc", "dem2.asc", "log_carea.asc", "cslope.asc",
"ndvi.asc")
s <- stack(x = file.path(tempdir(), raster_names))
pred <- predict(object = s,
model = fit,
fun = predict,
type = "response")
pred <- crop(x = pred,
y = as(random_points, "Spatial"))
# plot the output (shown in Figure 5)
plot(pred)
plot(st_geometry(random_points), add = TRUE)
Note that an alternative to raster::predict() is
RSAGA’s multi.local.function() in conjunction with
grid.predict() (see Alexander
Brenning 2008).
`r''````{r, eval = FALSE, message = FALSE}
library("RSAGA")
multi.local.function(path = tempdir(), in.grids = c("dem1","dem2",
"cslope", "ndvi", "carea"), out.varnames = "pred", fit = fit,
fun = grid.predict, trafo = my_trafo,
control.predict = list(type = "response"),
env = rsaga.env(path = "C:/OSGEO4~1/apps/saga"))
```
The model reached a good goodness-of-fit (explained deviance divided by null deviance) of 0.78. The most important variable in predicting species richness was elevation and its squared term. In our interpretation, variation with elevation mainly relates to differences in water availability. Humidity, and thus species richness, is greatest just below the temperature inversion (ca. 750–850 m). For a more detailed interpretation of the model and its predictors, refer to Muenchow, Feilhauer, et al. (2013).
In this section we would like to show examplarily how one can easily
extend RQGIS through Python and especially PyQGIS. As explained
in sections 2.1 and 2.2, RQGIS uses the
reticulate package to establish a tunnel to the QGIS API. To
find out which Python binary is in use we run the
py_config() function of the reticulate package.
When using Windows, please do so only after having run
open_app before, since this sets all necessary paths to run
the QGIS Python binary. Otherwise py_config() will be
unable to find it, in which case it most likely would use another Python
binary (e.g., the one shipped with Anaconda) if available. Additionally,
open_app() imports various necessary Python libraries
(among others osgeo, processing and
qgis), and attaches our Python RQGIS class (see section 2.2).
library("reticulate")
py_config()
## python: C:/OSGeo4W64/bin/python.exe
## libpython: C:/OSGeo4W64/bin/python27.dll
## pythonhome: C:\OSGeo4W64\apps\Python27
## version: 2.7.5 (default, May 15 2013, 22:44:16) [MSC v.1500 64 bit (AMD64)]
## Architecture: 64bit
## numpy: C:\OSGeo4W64\apps\Python27\lib\site-packages\numpy
## numpy_version: 1.12.1
##
## NOTE: Python version was forced by use_python function
We have defined the Python class RQGIS in python_funs.py
which is part of ‘inst/python’. Aside from set_env() all
functions found in ‘R/processing’ make extensive use of the Python RQGIS
methods. Most of the Python methods have the same names as their
counterparts in R. We can access them with py_run_string()
of the reticulate-package. This sends a call to Python, and
converts the Python into R output if desired.
py_run_string("methods = dir(RQGIS)")$methods
## [1] "__doc__" "__init__" "__module__"
## [4] "check_args" "get_args_man" "get_options"
## [7] "open_help" "qgis_session_info"
Of course, we can use the Python RQGIS methods directly via
reticulate which is exactly what the RQGIS-package is
doing. For example, to find out what the options are for a QGIS
geoalgorithm named qgis:randompointsinsidepolygonsvariable,
we can run:
py_cmd <- "opts = RQGIS.get_options('qgis:randompointsinsidepolygonsvariable')"
py_run_string(py_cmd)$opts
## $STRATEGY
## [1] "Points count" "Points density"
Or we can use PyQGIS-functionality directly. For instance, assuming
we would like to find out the function parameters of
qgis:randompointsinsidepolygonsvariable, we can use
alghelp() from the QGIS Python processing framework (Graser and Olaya 2015 and see also https://docs.qgis.org/2.8/en/docs/user_manual/processing/console.html).
Here, we only show the first 40 characters of the output.
py_cmd <- "processing.alghelp('qgis:randompointsinsidepolygonsvariable')"
py_capture_output(py_run_string(py_cmd)) %>%
substring(., 1, 40)
## [1] "ALGORITHM: Random points inside polygons"
Users can easily extend the RQGIS class with additional methods, or
they could write their own classes and methods. Also if there is a need
to write Python one-liners or make use of some Python functionality,
this can easily be done using RQGIS in conjunction with
reticulate. Here, we will present one last example. Frequently,
users have asked us if it was possible to also use the QGIS map canvas
from within R. Therefore, we provide a proof-of-concept how this could
be achieved (see also http://docs.qgis.org/testing/en/docs/pyqgis_developer_cookbook/canvas.html).
First of all, we save random_points to a temporary output
location.
data("random_points", package = "RQGIS")
file <- normalizePath(file.path(tempdir(), "points.shp"), winslash = "/",
mustWork = FALSE)
sf::st_write(random_points, dsn = file)
Before running the subsequent code, make sure that you have already
attached the packages RQGIS and reticulate.
Additionally, you need to run open_app() first. Then we can
create the map canvas, add the previously saved shapefile to it, set the
extent to the extent of our shapefile, set the map canvas layer, and
finally open a standalone map window (Figure 6). If this does not open a standalone window
you might have to run py_run_string(’app.exec_()’) to
initialize a Qt event loop which in turn renders the points in a
standalone window (pers. comm. Barry Rowlingson). We emphasize that this
is only a proof-of-concept, and a rather unstable solution (see section
5.4).
# create the map canvas
py_run_string("canvas = QgsMapCanvas()")
# import point shapefile
py_run_string(sprintf("layer = QgsVectorLayer('%s', 'points', 'ogr')", file))
# add imported point layer to the map canvas
py_run_string("QgsMapLayerRegistry.instance().addMapLayer(layer)")
# set the extent of the map canvas to the extent of the imported shapefile
py_run_string("canvas.setExtent(layer.extent())")
# set the map canvas layer
py_run_string("canvas.setLayerSet([QgsMapCanvasLayer(layer)])")
# open a standalone window
py_run_string("canvas.show()")
# if a standalone window has not already opened, run the next line
# py_run_string("app.exec_()")
In our use case (see section 3) we mainly used QGIS for raster preprocessing and the generation of terrain attributes. Naturally, there are many more geospatial analysis problems that can be solved using the thousands of geoalgorithms that are accessible through RQGIS and other R–GIS packages. For instance, SAGA provides more than 600 (Conrad et al. 2015) and GRASS more than 500 functions (http://grass.osgeo.org/grass72/manuals/).
For example, apart from relatively simple DEM derivatives (slope angle and orientation), a GIS can also calculate more complex, process-oriented terrain attributes such as the relative slope position or the topographic wetness index (Conrad et al. 2015). Additionally, most GIS can easily extract stream networks (T. Hengl, Heuvelink, and van Loon 2010) and surface roughness (Grohmann 2004). Somewhat related are geospatial calculations concerned with terrain classification and landform identification (Alexander Brenning 2012b; Rocchini et al. 2013). Physically-based models (e.g., SHALSTAB or Factor of Safety, both available in SAGA GIS) may provide additional insights (Goetz, Guthrie, and Brenning 2011). Of course, GIS also provide an extensive suite of vector processing tools as required, for example, in geomarketing.
As pointed out in the beginning R has its limitations regarding GIS capabilities, but when it comes to statistical analyses, R is the uncontested champion in its field. For instance, instead of using a polynomial function in our use case (see section 3.2), we could have used a generalized additive model (GAM) with a logarithmic link function to allow for nonlinear relationships between various predictors and species richness (Figure 4). A GAM is the nonlinear extension of a generalized linear model (GLM), and uses smoothing functions to deal with nonlinearity (Hastie 2017). In ecology, coupling ordination techniques and GAMs is a fruitful approach to spatially predict ecological communities (Muenchow, Bräuning, et al. 2013).
Equally, machine learning algorithms (support vector machines, random forests, etc.) are readily available in R, although these methods may tend to overfit the training data (A. Brenning 2005). Overfitting in turn limits a model’s ability to spatially predict the response variable. Here, spatial cross-validation through the sperrorest package (Alexander Brenning, Long, and Fieguth 2012) provides an opportunity to assess spatial predictive capabilities.
Frequently, the residuals of spatial models show some form of spatial autocorrelation. This violates the assumption of independent model residuals made by most statistical models (Dormann et al. 2007; Zuur et al. 2009). Violating this assumption can lead to untrustworthy \(p\) values, biased coefficients and subsequently poor predictions (Zuur et al. 2009). Fortunately, packages such as nlme (Pinheiro, Bates, and R-core 2017) and mgcv (Wood 2017) let the user incorporate various correlation structures into a model. This is most helpful in the presence of temporal and spatial autocorrelation (e.g, Iturritxa, Mesanza, and Brenning 2015). Sometimes mixed-effect models may also account for autocorrelation since random intercepts allow for correlation within a group (e.g., Peters et al. 2014). Another way of dealing with spatial autocorrelation is e.g., to include auto-regressive correlation structures in a GLM or GAM within a Bayesian modeling approach (Zuur and Ieno 2017).
To summarize, R offers an incredibly vast suite of advanced statistical and data science methods. On the other hand, QGIS and other GIS software offer a rich suite of geoalgorithms and geocomputational power. Interfacing R with GIS simply combines the best of two worlds for automated statistical geocomputing.
Compared to the two separate packages rgrass7 and
RSAGA, RQGIS has the advantage of providing a unified
interface to both GRASS and SAGA GIS toolboxes. Moreover, QGIS
facilitates the usage of third-party geoalgorithms by automatically
converting vector (e.g., shapefiles) and raster formats (e.g., ASCII
grid files) into the particular format supported by the third-party
module. For instance, SAGA has its own grid format (sgrd-files) and
GRASS uses its own database format. Running SAGA or GRASS functions,
(R)QGIS automatically converts the input data using
io_gdal() in the case of SAGA and v.in.ogr()
or r.in.gdal() in the case of GRASS. Though this is
extremely user-friendly especially when providing interfaces to various
third-party providers, it comes at the prize of increased computing time
due to the necessity of multiple format conversions during one
geoalgorithm call. Equally user-friendly it the automatic setup of the
GRASS environment (projection, region and mapset) through (R)QGIS, if
necessary. This certainly facilitates access to GRASS, especially for
less experienced GIS users. Finally, RQGIS is overall quite
user-friendly due to its convenience functions open_help()
and get_args_man() and through its support of R named
arguments for geoalgorithm parameters (see sections 2.1 and 2.2).
However, (R)QGIS only integrates a subset of the modules available in SAGA and GRASS GIS. While this fraction is likely to grow in the near future, a full integration of all modules is improbable as it would duplicate functionality (though this of course already has happened) and interface functions that are unnecessary within the QGIS environment, such as the GRASS database functions. If the user’s intention is to use GRASS’s database management system (DBMS), the direct R–GRASS integration via the spgrass6 (R. Bivand, Pebesma, and Gómez-Rubio 2013) and rgrass7 packages (R. Bivand and Neteler 2000) would be the appropriate path. RQGIS does not provide access to this DBMS since the GRASS plugin of the QGIS processing toolbox only allows restricted access to GRASS’s DBMS functionality. The use of rgrass7 also allows the user to operate within a single GRASS session instead of calling a new one for each GRASS command as implicitly done by QGIS.
In the case of SAGA GIS, RSAGA has additional benefits.
First of all, RSAGA provides numerous user-friendly wrapper
functions with arguments (and meaningful default values) documented in
the R help pages. RSAGA also strives to provide unified access
to a range of SAGA versions while using, if possible, persistent
function and argument names as well as default values. This allows for
an easier migration between SAGA versions. At the moment RSAGA
supports versions 2.0.4–2.2.3, but support for SAGA versions until the
current version 6.1 has already been developed, and is currently being
tested. By contrast, QGIS 2.14 supports SAGA 2.1.2–2.3.1. With the
release of QGIS 2.18.10, this support was limited to the long term SAGA
release 2.3.x. Extremely useful are furthermore RSAGA’s special
geocomputing functions that allow, for example, the application of any
user-defined R function to a stack of grids, either locally or within
moving windows (functions multi.focal.function() and
multi.local.function()). In conjunction with
grid.predict(), predict methods of models fitted in R can
therefore also be applied to stacks of raster files, as shown in Alexander Brenning (2008).
In the end, it will be up to the user to decide whether to use RQGIS, RSAGA, rgrass7 or some combination of these, depending on the user’s preferences, expertise and tasks at hand. In any case, we would recommend RQGIS if a user
requires mainly the more commonly used SAGA and GRASS functions,
does not want to be bothered with setting up the GRASS environment,
does not plan to use GRASS geodatabase capabilities,
does not want to worry about spatial data format conversions,
would like to use spatial objects residing in R as input-arguments, and load GIS output automatically into R,
cherishes the flexibility of seamlessly integrating QGIS, GRASS, SAGA and other third-party software (GDAL/OGR, Orfeo Toolbox, LAStools, TauDEM) within a single geoprocessing workflow in R.
The integration of GIS functionality into R is not limited to the mentioned open-source GIS software, but also includes the global leader in commercial GIS software (Longley et al. 2011), ArcGIS, through the RPyGeo package (Alexander Brenning 2012a). This package piggybacks on ArcGIS’s own Python interface to ArcGIS functions, or ‘tools’. RPyGeo generates Python code that calls ArcGIS. Some of the challenges currently include
latency times related to launching Python and ArcGIS before each GIS call,
passing data and files between R and ArcGIS,
error tracking based on Python and/or ArcGIS error messages,
interpretation of ArcGIS help pages from the perspective of a geoprocessing interface function in R.
While some of these limitations may be overcome in future releases of RPyGeo and ArcGIS, the growing interest in integrating R and ArcGIS is also evident from recent efforts to provide access to R from within ArcGIS (https://r-arcgis.github.io/). This so-called R–Bridge allows users to
retrieve data from ArcGIS geodatabases into R as
"sp" objects, and export R data back into ArcGIS
geodatabases,
run R code within ArcGIS as a user-defined ‘tool’. This is similar to QGIS’s ability to integrate R scripts as user-defined GIS modules in the Processing toolbox.
While this connectivity, in principle, goes both ways, the integration of ArcGIS into R through the R–Bridge is currently limited to data import/export, which is complementary to RPyGeo’s capability of executing ArcGIS modules from within R.
There are several interesting developing directions in the R-spatial
world and for RQGIS. For example, we have been asked multiple
times to include the QGIS mapping widget within R for fast interactive
visualization and styling of multiple layers. In section 4 we have shown that this is theoretically
possible. However, mixing R and Qt events seems to cause frequently
trouble (also pers. comm. Barry Rowlingson). For instance, when running
QgsMapCanvas() twice or enlarging the standalone window
manually (Figure 6), one runs a high risk
of crashing the current R session. Therefore, Kevin Stadler, Barry
Rowlingson and Julia Wagemann have opted for a different approach
realized within a Google
Summer of Code Project. In this project they wrote the QGIS Plugin
Network
API which enables the user to access the QGIS API via a HTTP
interface from another language capable of making HTTP calls (such as
R). Along with the sibling R package qgisremote (https://qgisapi.gitlab.io/qgisremote/index.html) this
allows R users to seamlessly exchange spatial data between R and QGIS.
For example, one can add vector and raster layers from within R to the
QGIS map canvas, interactively edit them (such as adding new points or
changing polygon vertices), and load the results back into R. Similarly,
the packages mapview (build on top of leaflet,
Cheng, Karambelkar, and Xie 2017) and mapedit lets the
user interactively visualize and edit spatial objects on leaflet maps.
The advantage of qgisremote over these two packages is that the
QGIS map canvas probably is better able to handle larger quantities of
spatial data compared to leaflet, and that it provides the user with the
full power of a Desktop GIS for manually editing spatial data (trace,
snap, etc.). Nevertheless, mapview is a great tool for
interactive visualization, and mapedit is a good alternative
for small and quick modifications of spatial vector data within R.
Coming back to the user request to include the QGIS mapping widget into
RQGIS, we can state that qgisremote already filled
this gap perfectly. In summary, RQGIS and qgisremote
complement one another. The first gives an R user direct access to the
QGIS processing engine, and the latter hands an R user the power of a
Desktop GIS graphical user interface.
Adding new functionalities to RQGIS will generally include Python programming. Since QGIS migrates from Python 2 to Python 3 by the end of 2017, it is probably best to postpone major RQGIS extensions until after the migration. To guarantee a smooth transition, and to offer the possibility to work either with QGIS 2 or QGIS 3, we have designed RQGIS in such a way that we ideally would merely have to add a Python 3 script to ‘inst/python’ to make RQGIS also work with QGIS 3. In the case of a bumpier transition than anticipated, RQGIS users may rest assured that RQGIS will work in any case with the QGIS long term release 2.18 which will be further developed and regularly updated (see https://www.qgis.org/en/site/getinvolved/development/roadmap.html#release-schedule).
Another envisaged RQGIS update includes the support for
"stars" classes (see https://github.com/r-spatial/stars). stars aims
to mainly extend the raster package.
Combining R and GIS software creates a powerful environment for
advanced statistical geocomputing. RQGIS makes this also
possible with QGIS—one of the most-widely used open-source GIS, which is
therefore probably also very appealing to R users. Conveniently,
RQGIS offers a unified interface to various desktop GIS (SAGA,
GRASS, etc.) that are integrated into QGIS. The use of GIS tools is
facilitated through auxiliary functions for the automatic retrieval of
function arguments and their default values, the support of R named
arguments in run_qgis(), the seamless exchange of spatial
data types, and the quick access of the online help for any QGIS
geoalgorithm.
We greatly appreciate the work of the QGIS development and the R core team. We are also greatly indebted to Dr. Rainer Krug (University of Zurich) who not only reviewed our manuscript two times but also improved it with his valuable suggestions. Of course, we thank Dr. Roger Bivand, our editor, for handling our manuscript. We are also grateful to Dr. Tomislav Hengl (SRIC – World Soil Information, Wageningen), and an anonymous reviewer for valuable feedback on a previous manuscript version. Finally, we would like to thank Barry Rowlingson (Lancaster University) for fruitful discussions on Python, QGIS and R.