Abstract
Tools for transport planning should be flexible, scalable, and transparent. The stplanr package demonstrates and provides a home for such tools, with an emphasis on spatial transport data and non-motorized modes. The stplanr package facilitates common transport planning tasks including: downloading and cleaning transport datasets; creating geographic “desire lines” from origin-destination (OD) data; route assignment, locally and interfaces to routing services such as CycleStreets.net; calculation of route segment attributes such as bearing and aggregate flow; and ‘travel watershed’ analysis. This paper demonstrates this functionality using reproducible examples on real transport datasets. More broadly, the experience of developing and using R functions for transport applications shows that open source software can form the basis of a reproducible transport planning workflow. The stplanr package, alongside other packages and open source projects, could provide a more transparent and democratically accountable alternative to the current approach, which is heavily reliant on proprietary and relatively inaccessible software.Transport planning can broadly be defined as the process of designing and evaluating transport interventions (Ortuzar and Willumsen 2011), usually with the ultimate aim of improving transport systems from economic, social, and environmental perspectives. This inevitably involves a degree of subjective judgment and intuition. With the proliferation of new transport datasets — and the increasing availability of hardware and software to make sense of them — there is great potential for the discipline to become more evidence-based and scientific (Balmer, Rieser, and Nagel 2009). Transport planners have always undertaken a wide range of computational activities (Boyce and Williams 2015), but with the digital revolution the demands have grown beyond the capabilities of a single, monolithic product. The diversity of tasks, and the need for democratic accountability in public decision making, suggests that future-proof transport planning software should be:
flexible, able to handle a wide range of data formats
scalable, able to work at multiple geographic levels from single streets to large cities and regions
robust and reliable, tested on a range of datasets and able to work “out of the box” in a range of real-world projects
open source and reproducible, ensuring transparency and encouraging citizen science
This paper sets out to demonstrate that open source software with a command-line interface (CLI) can provide a foundation for transport planning software that meets each of these criteria. R provides a strong basis for progress in this direction because it already contains functionality used in common transport planning workflows. The sp, rgeos, and rgdal packages greatly improved R’s spatial abilities (Bivand, Pebesma, and Gómez-Rubio 2013), work that is being consolidated and extended in the recent sf package.
Building on these foundations, a number of spatial packages have been developed for applied domains including: disease mapping and modelling, with packages such as SpatialEpi and diseasemapping (Kim and Wakefield 2016; Brown and Zhou 2016); spatial ecology, with the adehabitat family of packages (Calenge 2006); and visualisation, with packages such as leaflet, tmap, mapview, and mapmisc (Brown 2016). However, there has been little prior work to develop R functionality designed specifically for transport planning, with the notable exceptions of TravelR (a package on R-Forge last updated in 2012) and gtfsr (a package for handling General Transit Feed Specification (GTFS) data).
The purpose of stplanr is to provide a toolbox rather than a specific solution for transport planning, with an emphasis on spatial data and active modes. This emphasis is timely given the recent emphasis on sustainability (Banister 2008) and ‘Big Data’ (Zheng et al. 2016) in the wider field of transport planning.
A major motivation was the lack of R packages, and open source software in general, for transport applications. This may be surprising given the ubiquity of transport problems;1 R’s proficiency at handling spatial, temporal and travel survey data that describe transport systems; and the growing popularity of R in applied domains (Jalal et al. 2017; Moore and Hutchinson 2017). Another motivation is the growth in open access datasets: the main purpose of early versions of the package was to process open origin-destination data (Robin Lovelace et al. 2017).
R is already used in transport applications, as illustrated by recent research that applies packages from other domains to transport problems. For instance, Efthymiou and Antoniou (2012) ((2012)) use R to analyse the data collected from an online survey focused on car-sharing, bicycle-sharing, and electric vehicles. Efthymiou and Antoniou (2012) ((2012)) also used R to collect and analyse transport-related data from Twitter using packages including XML, twitteR and ggplot2. These packages were used to download, parse and plot the Twitter data using a method that can be repeated and the results reproduced or updated. More general statistical analyses have also been conducted on transport-related datasets using packages including muStat and mgcv (Diana 2012; Cerin et al. 2013). Despite the rising use of R for transport research, there has yet been to be a package for transport planning.
The design of the R language, with its emphasis on flexibility, data processing, and statistical modelling, suggests it can provide a powerful environment for transport planning research. There are many quantitative methods in transport planning, many of which fit into the classic ‘four stage’ transport model which involves the following steps (Ortuzar and Willumsen 2011): (1) trip generation to estimate trip frequency from origins; (2) distribution of trips to destinations; (3) modal split of trips between walking, cycling, buses etc.; (4) assignment of trips to the transport route network. To this we would like to add two more stages for the big data age: (0) data processing and exploration; and (5) validation. This sequence is not the only way of transport modelling and some have argued that its dominance has reduced innovation. However it is certainly a common approach and provides a useful schema for classifying the kinds of task that stplanr can tackle:
accessing and processing of data on transport infrastructure and behaviour (stage 0)
analysis and visualisation of the transport network (0)
analysis of origin-destination (OD) data and the visualisation of resulting ‘desire lines’
the allocation of desire lines to roads and other guideways via routing services
the aggregation of routes to estimate total levels of flow on segments throughout the transport network
development of models to estimate transport behaviour currently and under various scenarios of change
the calculation of “catchment areas” affected by transport infrastructure
The automation of such tasks can assist researchers and practitioners to create evidence for decision making. If the data processing and analysis stages are fast and painless, more time can be dedicated to visualisation and decision making. This should allow researchers to focus on problems, rather than on clunky graphical user interfaces (GUIs), and ad-hoc scripts that could be generalised. Furthermore, if the process can be made reproducible and accessible (e.g. via online visualisation packages such as shiny), this could help transport planning move away from reliance on “black boxes” (Waddell 2002) and empower citizens to challenge decisions made by transport planning authorities based on the evidence (Hollander 2016).
There are many advantages of using a scriptable, interactive, and open source language such as R for transport planning. Such an approach enables: reproducible research; the automation and sharing of code between researchers; reduced barriers to innovation, as anyone can create new features for the benefit of all planners; easier interaction with non domain experts (who will lack dedicated software); and integration with other software systems, as illustrated by the use of leaflet to generate JavaScript for sharing interactive maps for transport planning, used in the publicly accessible Propensity to Cycle Tool (Robin Lovelace et al. 2017). Furthermore, R has a strong user community which can support newcomers (stplanr was peer reviewed thanks to the community surrounding ROpenSci). The advantages of using R specifically to develop the functionality described in this paper are that it has excellent geo-statistical capabilities (Pebesma, Bivand, and Ribeiro 2015), visualisation packages (e.g. tmap, ggplot2), support for logit models (which are useful for modelling modal shift), and support for the many formats that transport datasets are stored in (e.g., via the haven and rio packages).
The package can be installed and attached as follows (see the package’s README for dependencies and access to development versions):
install.packages("stplanr")
library(stplanr)
stplanr imports both sp and its successor
sf. This means that spatial objects used in and produced by the
package work with base R generic functions such as summary,
aggregate, and plot (Bivand, Pebesma, and Gómez-Rubio 2013).
Furthermore, output objects of class "sf" are
mostly compatible with the popular data processing package dplyr.
The package’s core functions are structured around three common types of spatial transport data:
Origin-destination (OD) data, which report the number of people
travelling between origin-destination pairs. This type of data is not
explicitly spatial (OD datasets are usually represented as data frames)
but represents movement over space between points in geographical space.
An example is provided in the flow dataset.
Line data, one dimensional linear features on the surface of the
Earth. These are typically stored as a
"SpatialLinesDataFrame" object.
Route data are special types of lines which have been allocated
to the transport network. Routes typically result from the allocation of
a straight “desire line” allocated to the route network with a
route_ function. Route network represent many overlapping
routes. All are typically stored as a
"SpatialLinesDataFrame" object.
A convention has been developed whereby function names are prefixed
depending on the the input data type (od_,
line_ and route_ respectively, although
route_ functions do not take routes as inputs, they output
them). A selection of these is presented in Table 1
(lsf.str("package:stplanr") returns a list of all
functions). Additional “core functions” could be developed, such as
those prefixed with rn_ (for working with route network
data) and geo_ functions for geographic operations such as
buffer creation on lat/lon projected data (this function is currently
named buff_geo).
| Function | Input data type(s) | Output data type |
|---|---|---|
od_dist |
Data frame | Numeric vector |
od_id_order |
Data frame | Data frame |
line_bearing |
Spatial line | Numeric vector |
line_midpoint |
Spatial line | Spatial points |
route_cyclestreet |
Coordinates, spatial point or text | Spatial lines |
route_graphhopper |
Coordinates, spatial point or text | Spatial lines |
We aim to preserve type in some functions: line2route,
for example, takes spatial lines objects and returns spatial lines
objects. Type stability has its limitations with spatial data, however:
it would be wasteful for functions such as line_bearing
(which returns the bearing of a line) to duplicate the spatial data
contained in its input, for instance. Generic classes enable
stplanr to handle objects of class "sf" from the
new sf package in a type preserving way.
A class system has not been developed for each data type (this option is discussed in the final section) and more classes would be possible: transport datasets are diverse. This diversity helps explain why some functions have more ad-hoc names. Rather than attempting a systematic description of each of stplanr’s functions, the remainder of this paper shows how the package can be used for transport planning, beginning with data access and ending with visualisation.
Gaining access to data is often the problem in transport planning. It can be a long and protracted process but is becoming easier thanks to the “open data” movement and packages such as tigris and osmdata (Walker 2016).
The stplanr package helps import data with functions
including read_table_builder, for importing data from the
Australian Bureau of Statistics (ABS), and dl_stats19 —
which has now been split-out into the package stats19 —
for downloading datasets from the UK’s Stats19 road traffic casualty
system (R. Lovelace et al. 2019). A brief
example of the latter is demonstrated below, which begins with
downloading the data (warning this downloads ~100 MB of data):
dl_stats19() # download and extract stats19 road traffic casualty data
#> [1] "Data saved at: /tmp/RtmpppF3E2/Accidents0514.csv"
#> [2] "Data saved at: /tmp/RtmpppF3E2/Casualties0514.csv"
#> [3] "Data saved at: /tmp/RtmpppF3E2/Vehicles0514.csv"
Once the data has been saved in the default directory, determined by
tempdir, it can be read-in and cleaned with the
read_stats19_ functions (note these call
format_stats19_ functions internally to clean the datasets
and add correct labels to the variables):
ac <- read_stats19_ac()
ca <- read_stats19_ca()
ve <- read_stats19_ve()
The resulting datasets (representing accident, casualty, and vehicle level data, respectively) can be merged and made geographic, as illustrated below:2
library(dplyr)
ca_ac <- inner_join(ca, ac)
ca_fatal <- ca_ac %>%
filter(Casualty_Severity == "Fatal" & !is.na(Latitude)) %>%
select(Age = Age_of_Casualty, Mode = Casualty_Type, Longitude, Latitude)
ca_sp <- sp::SpatialPointsDataFrame(coords = ca_cycle[3:4], data = ca_cycle[1:2])
Now that this casualty data has been cleaned, subsetted (to only
include fatal crashes) and converted into a spatial class system, we can
analyse the data using geographical datasets of the type commonly used
by stplanr. The following code, for example, geographically
subsets the dataset to include only crashes that occurred within the
bounding box of a sample route
network dataset contained as a dataset in stplanr for
illustrative purposes. Note the use of bb2poly, which
converts a spatial dataset into a box, represented as a rectangular
SpatialPolygonsDataFrame:
data("route_network")
sp::proj4string(ca_sp) <- sp::proj4string(route_network)
bb <- bb2poly(route_network)
sp::proj4string(bb) <- sp::proj4string(route_network)
ca_local <- ca_sp[bb,]
The above code chunk shows the importance of understanding
geographical data when working with transport data. It is only by
converting the casualty data into a spatial data class, and adding a
coordinate reference system (CRS), that transport planners and
researchers can link this important dataset back to the route network.
We can now perform GIS operations on the results. The next code chunk,
for example, finds all the fatalities that took place within 100 m of
the route network, using the function buff_geo:
rnet_buff_100 <- buff_geo(route_network, width = 100)
ca_buff <- ca_local[rnet_buff_100,]
These can be visualised using base R graphics, extended by sp, as illustrated in Figure 1. This provides a good start for analysis but for publication-quality plots and interactive plots, designed for public engagement, we recommend using dedicated visualisation packages that work with spatial data such as tmap.
plot(bb, lty = 4)
plot(rnet_buff_100, col = "grey", add = TRUE)
points(ca_local, pch = 4)
points(ca_buff, cex = 3)
National censuses are one common source of transport data, that
frequently include questions on where people live and work. This is
often accompanied by a question on what mode was taken. These data are
generally provided in standard tables that can be downloaded for free.
For the Australian census, both these standard tables are available as
well as custom tables that can be produced using a service named
TableBuilder. TableBuilder generates Excel or CSV files that contain
additional lines that make reading these data into R difficult. The
stplanr package provides the read_table_builder
function to read in these files.
Using the example SA1Population.xlsx file included with
stplanr that contains the population by SA1 zone (a statistical
area used for the Australian census), we can use the
read_table_builder function to read and format the table
for use in R. The function automatically removes the additional lines
and sets the column headers and data types as appropriate. The result is
a data frame that can be used with GIS boundary files and other datasets
produced for SA1 zones.
data_dir <- system.file("extdata", package = "stplanr")
t2 <- read_table_builder(file.path(data_dir, 'SA1Population.xlsx'),
filetype = 'xlsx', sheet = 1, removeTotal = TRUE)
Origin-destination (OD) data, which represent the number of people
travelling between geographical zones, is a key input for transport
planning (Calabrese et al. 2011). OD data
usually represent an aggregate data source, and are therefore able to
represent the travel patterns of an entire country in a file of
manageable size (see wicid.ukdataservice.ac.uk/
for example). They can be stored as a (sparse) matrix or (more commonly)
a long table of OD pairs. The long form is illustrated in the code chunk
below which shows a sample of the flow object.
flow is a data frame representing the number of home-work
commutes by mode between residential areas in the UK, provided provided
in stplanr
for teaching and demonstration purposes (see ?flow to see
how this dataset was created):
data("flow", package = "stplanr")
head(flow[c(1:3, 12)])
#> Area.of.residence Area.of.workplace All Bicycle
#> 920573 E02002361 E02002361 109 2
#> 920575 E02002361 E02002363 38 0
#> 920578 E02002361 E02002367 10 0
#> 920582 E02002361 E02002371 44 3
#> 920587 E02002361 E02002377 34 0
#> 920591 E02002361 E02002382 7 0
Although the flow data displayed above describes movement over
geographical space, it contains no explicitly geographical information.
Instead, the coordinates of the origins and destinations are linked to a
separate geographical dataset (represented by the cents
dataset), as illustrated below:4
data("cents", package = "stplanr")
as.data.frame(cents[1:3, -c(3,4)])
#> geo_code MSOA11NM coords.x1 coords.x2
#> 1708 E02002384 Leeds 055 -1.546463 53.80952
#> 1712 E02002382 Leeds 053 -1.511861 53.81161
#> 1805 E02002393 Leeds 064 -1.524205 53.80410
A common task is linking an OD dataset (e.g., flow) to a
geographic dataset representing zone centroids
(e.g. cents). We use od2line to combine them,
as illustrated in the code chunk below, which creates an object named
l, a spatial object that will be visualised in the next
section:
l <- od2line(flow = flow, zones = cents)
od2line function.
A larger example represents flights from New York in 2013 from the nycflights13
package. The code chunk below makes use of the function
geo_buffer, and highlights stplanr’s ability to
work with "Spatial" or "sf" objects
interchangeably. Figure 3 demonstrates how
the resulting spatial object can be plotted interactively using packages
such as tmap (see the accompanying R file for visualisation
code).
library(nycflights13)
airports_sf <- sf::st_as_sf(airports, coords = c("lon", "lat"), crs = 4326)
ny_buff <- geo_buffer(airports_sf[airports_sf$faa == "NYC",], dist = 1e6)
airports_near <- airports_sf[ny_buff,]
flights_near <- flights[flights$dest %in% airports_near$faa,]
flights_agg <- dplyr::group_by(flights_near, origin, dest) %>%
dplyr::summarise(Flights = n())
flights_sf = od2line(flow = flights_agg, zones = airports_near)
plot(flights_sf, lwd = flights_sf$Flights / 1e3)
A common problem faced by transport researchers is network allocation: converting the ‘as the crow flies’ lines illustrated in Figure 3 into routes. These are the complex, winding paths that people and animals make to avoid obstacles such as buildings and to make the journey faster and more efficient (e.g. by following the route network).
This is difficult (and was until recently near impossible using free software) because of the size and complexity of transport networks, the complexity of realistic routing algorithms and need for context-specificity in the routing engine. Inexperienced cyclists, for example, would take a very different route than a heavy goods vehicle. The stplanr package tackles this issue in two ways: by providing routing functionality that can work on locally stored data and by using third party APIs. Each approach has advantages: local routing functionality is fast and free, whereas online routing services can be more sophisticated (e.g., taking into account local traffic conditions) and work anywhere in the world without needing to download unwieldy datasets and new software libraries such as OSRM. A disadvantage of relying on online services is that they may break or change without warning in the future.
Route allocation is undertaken by route_ functions such
as route_cyclestreets and route_graphhopper.
These allocate a single OD pair (represented by from and
to arguments as coordinates, spatial point objects, or text
strings to be "geo-coded") to the transport network. This is illustrated
below with route_cyclestreet, which uses the CycleStreets.net API, a
routing service “by cyclists for cyclists” which provides "fastest",
"quietest," and "balanced" routes:5
route_bl <- route_cyclestreet(from = "Bradford, Yorkshire", to = "Leeds, Yorkshire")
route_c1_c2 <- route_cyclestreet(cents[1,], cents[2,])
The raw output from routing APIs is usually provided as a JSON or
GeoJSON text string. By default, route_cyclestreet saves a
number of key variables (including length, time, hilliness and busyness
variables generated by CycleStreets.net) from the attribute data
provided by the API. If the user wants to save the raw output, the
save_raw argument can be used:
route_bl_raw <- route_cyclestreet(from = "Bradford", to = "Leeds", save_raw = TRUE)
Additional arguments taken by the route_ functions
depend on the routing function in question. By changing the
plan argument of route_cyclestreet to
fastest, quietest, or balanced,
for example, routes favouring speed, quietness or a balance between
speed and quietness will be saved, respectively.
To automate the creation of route-allocated lines over many desire
lines, the line2route function loops over each line,
wrapping any route_ function as an input. The output is a
SpatialLinesDataFrame with the same number of dimensions as
the input dataset (see the right panel in Figure 4).
routes_fast <- line2route(l = l, route_fun = route_cyclestreet)
The result of this ‘batch routing’ exercise is illustrated in Figure
4. The red lines in the left hand panel
are very different from the hypothetical straight ‘desire lines’,
highlighting the importance of this route-allocation functionality (see
vignette("introducing-stplanr") and kateto.net for more
sophisticated visualisation methods).
plot(route_network, lwd=0)
plot(l, lwd = l$All / 10, add = TRUE)
lines(routes_fast, col = "red")
routes_fast$All <- l$All
rnet <- overline(routes_fast, "All", fun = sum)
rnet$flow <- rnet$All / mean(rnet$All) * 3
plot(rnet, lwd = rnet$flow / mean(rnet$flow))
To estimate the amount of capacity needed at each segment on the
transport network, the overline function demonstrated
above, is used to divide line geometries into unique segments and
aggregate the overlapping values. The results, illustrated in the
right-hand panel of Figure 4, could be
used to inform the decision making process at the route network level,
such as where to create new bus routes cycle paths.
Limitations with the route_cyclestreet routing API
include its specificity to one mode (cycling) and a single region (the
UK and part of Europe). To overcome these limitations, additional
routing APIs were added with the functions
route_graphhopper, route_transportapi_public,
and viaroute. These interface to Graphhopper, TransportAPI,
and the Open Source Routing Machine (OSRM) routing services,
respectively. Advanced users can set-up local routing services (as
demonstrated in a guide by DigitalOcean),
reducing the disadvantages of using online routing services (they rely
on on potentially slow internet connections, changeable APIs, and
variable/high prices).
A short example of finding the route by car and bike between New York
and Oaxaca demonstrates how route_graphhopper can collect
geographical and other data on routes by various modes, anywhere in the
world. The output, shown in Table 2, shows
that the function also saves time, distance, and (for bike trips)
vertical distance climbed for the trips.
ny2oaxaca1 <- route_graphhopper("New York", "Oaxaca", vehicle = "bike")
ny2oaxaca2 <- route_graphhopper("New York", "Oaxaca", vehicle = "car")
rbind(ny2oaxaca1@data, ny2oaxaca2@data)
| mode | time | dist | change_elev |
|---|---|---|---|
| Bike | 17522.73 | 4885663 | 87388.13 |
| Car | 2759.89 | 4754772 | NA |
Catchment areas are useful analytic and visual tools in transport planning. They can help who will benefit from a particular transport intervention (such as a new bus stop) and illustrate the geographic area that is covered by (or omitted from) a particular service or transport system, to help prioritize new investment. Passengers are often said to be willing to walk up to 400 metres to a bus stop or 800 metres to a railway station (El-Geneidy et al. 2014), leading to surrounding smaller or larger polygons representing the catchment within which people would be willing to travel. Such catchment areas have been criticised as being arbitrary or as underestimating the true sphere of influence of public transport nodes (El-Geneidy et al. 2014; Daniels and Mulley 2013). However, they nonetheless represent a good starting point from which the geographical and social distribution of service provision can be estimated.
Often catchment areas are calculated using straight-line (or “as the crow flies”) distances. This approach is appealing because it requires little additional data and is simple to compute (using buffer function) and understand.
The stplanr package provides functionality that calculates
catchment areas using straight-line distances with the
calc_catchment function. This function takes a
"SpatialPolygonsDataFrame" object that contains the
population (or other) data, typically from a census, and a
Spatial* layer that contains the geometry of the transport
facility. These two layers are overlayed to calculate statistics for the
desired catchments including "proportioning" polygons to account for the
proportion located within the catchment area.
To illustrate this functionality, the following code chunk reads-in
sample datasets stored in the common ESRI Shapefile format using the
readOGR function from rgdal. The resulting object
smallsa1 contains population data for Statistical Area 1
(SA1) zones in Sydney, Australia. The testcycleway data set
contains hypothetical cycleways aligned to streets in Sydney:
data_dir <- system.file("extdata", package = "stplanr")
unzip(file.path(data_dir, 'smallsa1.zip'))
unzip(file.path(data_dir, 'testcycleway.zip'))
sa1income <- rgdal::readOGR(".", "smallsa1")
testcycleway <- rgdal::readOGR(".", "testcycleway")
# Remove unzipped files
file.remove(list.files(pattern = "^(smallsa1|testcycleway).*"))
The calculation of catchment areas requires two parameters: a vector containing column names to calculate statistics and a distance. Since proportioning the areas assumes projected data, unprojected data are automatically projected to either a common projection (if one is already projected) or a specified projection. It should be emphasized that the choice of projection is important and has an effect on the results meaning setting a local projection is recommended to achieve the most accurate results. The catchment area is calculated as follows:
catch800m <- calc_catchment(
polygonlayer = sa1income,
targetlayer = testcycleway,
calccols = c('Total'),
distance = 800,
projection = 'austalbers',
dissolve = TRUE
)
The result can be used to calculate the total population within the
catchment areas of the cycleway, with the command
sum(catch800m$Total): 39418 people. The catchment area can
then be visualised (see Figure 5):
plot(sa1income, col = "light grey")
plot(catch800m, col = rgb(1, 0, 0, 0.5), add = TRUE)
plot(testcycleway, col = "green", add = TRUE)
This simplistic catchment area is useful when the straight-line
distance is a reasonable approximation of the route taken to walk (or
cycle) to a transport facility. However, this is often not the case. The
catchment area in Figure 5 initially
appears reasonable but the red-shaded catchment area includes an area
that requires travelling around a bay to access from the
(green-coloured) cycleway: users may have access but that does
not mean it’s accessible. To allow for more realistic catchment
areas for most situations, stplanr provides the
calc_network_catchment function that uses the same
principle as calc_catchment but also takes into account the
transport network.
To use calc_network_catchment, a transport network, as a
"SpatialLinesNetwork" object, must be provided. This
combines a "SpatialLinesDataFrame" object with a graph
network (using the igraph
package) to allow routing (estimation of shortest paths). The network is
used to calculate the shortest actual paths within the specific
catchment distance, as demonstrated in the following code chunk:
unzip(file.path(data_dir, 'sydroads.zip'))
sydroads <- rgdal::readOGR(".", "roads")
file.remove(list.files(pattern = "^(roads).*"))
sydnetwork <- SpatialLinesNetwork(sydroads)
The network catchment is then calculated using a similar method as
with calc_catchment but with a few minor changes.
Specifically these are including the SpatialLinesNetwork,
and using the maximpedance parameter to define the
distance, with distance being the additional distance from the network.
In contrast to the distance parameter that is based on the straight-line
distance in both the calc_catchment and
calc_network_catchment functions, the
maximpedance parameter is the maximum value in the units of
the network’s weight attribute. In practice this is generally distance
in metres but can also be travel times, risk or other measures.
netcatch800m <- calc_network_catchment(
sln = sydnetwork,
polygonlayer = sa1income,
targetlayer = testcycleway,
calccols = c('Total'),
maximpedance = 800,
distance = 100,
projection = 'austalbers'
)
Once calculated, the network catchment area can be used just as the straight-line network catchment. This includes extracting the catchment population of 23457 and plotting the original catchment area together with the original area with the results shown in Figure 6:
plot(sa1income, col = "light grey")
plot(catch800m, col = rgb(1, 0, 0, 0.5), add = TRUE)
plot(netcatch800m, col = rgb(0, 0, 1, 0.5), add = TRUE)
plot(testcycleway, col = "green", add = TRUE)
The calc_catchment and
calc_network_catchment functions are complementary
functions allowing for the computation of catchment areas. Although for
many transport applications it would be better to use the
calc_network_catchment function that uses the true network
distances, it may still be useful to use the straight-line version in
some cases. These include calculating the population that may be
affected by noise levels from transport infrastructure (such as a
motorway) where the effects of noise are not limited by the network. In
situations where network data is not available it would be possible to
use the calc_catchment function together with a fixed
assumption about what straight-line distance corresponds to the desired
network distance (i.e., 1km network = 700m straight-line). It may also
be appropriate to use the straight-line distance when the network is not
a reasonable constraint. This may be because the infrastructure of
interest is accessed primarily by walking or cycling in an area where
these are not limited to the road network (such as parks or shopping
areas with passthroughs or overpasses available to pedestrians).
Route-allocated lines allow estimation of route distance and therefore circuity (\(Q\)) (route distance divided by Euclidean distance) (Levinson and El-Geneidy 2009):
\[Q = \frac{d_{Rf}}{d_E},\]
where (\(d_E\)) and (\(d_{Rf}\)) represent Euclidean and fastest route distance respectively.
These variables can help model the rate of flow between origins and
destination, as illustrated in the left-hand panel of Figure 7. The code below demonstrates how objects
generated by stplanr can be used to undertake such analysis,
with the line_length function used to find the distance, in
meters, of lat/lon data.
l$d_euclidean <- line_length(l)
l$d_rf <- routes_fast@data$length
plot(l$d_euclidean, l$d_rf,
xlab = "Euclidean distance", ylab = "Route distance")
abline(a = 0, b = 1)
abline(a = 0, b = 1.2, col = "green")
abline(a = 0, b = 1.5, col = "red")
The left hand panel of Figure 7 shows the expected strong correlation between Euclidean (\(d_E\)) and fastest route (\(d_{Rf}\)) distance. However, some OD pairs have a proportionally higher route distance than others, as illustrated by distance from the black line in the plot.
An extension to the concept of circuity is the "quietness diversion factor" (\(QDF\)) of a desire line (Robin Lovelace et al. 2017), the ratio of the route distance of a quiet route option (\(d_{Rq}\)) to that of the fastest:
\[QDF = \frac{d_{Rq}}{d_{Rf}}.\]
Thanks to the "quietest" route option provided by
route_cyclestreet, we can estimate average values for both
metrics as follows:
routes_slow <- line2route(l, route_cyclestreet, plan = "quietest")
l$d_rq <- routes_slow$length # quietest route distance
Q <- mean(l$d_rf / l$d_euclidean, na.rm = TRUE)
QDF <- mean(l$d_rq / l$d_rf, na.rm = TRUE)
Q
#> [1] 1.298767
QDF
#> [1] 1.034721
The results show that cycle paths are not particularly direct in the study region by international standards (CROW 2007). This is hardly surprisingly given the small size of the sample and the short distances covered: \(Q\) tends to decrease at a decaying rate with distance, meaning longer paths tend to be more direct. What is surprising is that the quietness diversion factor is close to unity, which could imply that the quiet routes are constructed along direct, and therefore sensible routes. When time is explored, we find that the "quietness diversion factor with respect to time" (\(QDF_t\)) is slightly larger, although larger datasets would be needed for any inferences to be made:
(QDFt <- mean(routes_slow$time / routes_fast$time, na.rm = TRUE))
#> [1] 1.052855
The second step in the four-stage transport model (after trip
generation) is trip distribution. This involves identifying destinations
for trips generated in each zone (creating OD data), which can be done
with a gravity model, logit model, or other technique under the broad
banner of "spatial interaction model" (SIM). The stplanr
package is not intended primarily as an SIM package, but could be
extended in this direction, as illustrated by the
od_radiation function, which implements a SIM proposed by
Simini et al. (2012). Further development
in this direction (or integration with a package dedicated to SIMs)
would complement stplanr’s focus on geographic functions.
At present there are no functions for modelling distance decay, but this is something we would like to add in future versions of stplanr. Although it is only one of several methods used for modelling trip distribution as part of four-step models, distance decay is an especially important concept for sustainable transport planning due to physical limitations on the ability of people to walk and cycle large distances (Iacono, Krizek, and El-Geneidy 2010). It can also be used to model relationships between locations in situations that are sensitive to distance (cycling facilities for instance) in addition to its use in four-step models.
We can explore the relationship between distance and the proportion
of trips made by walking using the same object l generated
by stplanr.
l$pwalk <- l$On.foot / l$All
plot(l$d_euclidean, l$pwalk, cex = l$All / 50,
xlab = "Euclidean distance (m)", ylab = "Proportion of trips by foot")
Based on the right-hand panel in Figure 7, there is a clear negative relationship
between distance of trips and the proportion of those trips made by
walking. This is unsurprising: beyond a certain distance (around 1.5km
according the the data presented in the figure above) walking is usually
seen as too slow and other modes are considered. This "distance decay"
is non-linear and can be approximated by a range of functional forms
(Martínez and Viegas 2013). From the range
of options we test below just two forms. We will compare the ability of
linear and log-square-root functions to fit the data contained in
l for walking.
lm1 <- lm(pwalk ~ d_euclidean, data = l@data, weights = All)
lm2 <- lm(pwalk ~ d_rf, data = l@data, weights = All)
lm3 <- glm(pwalk ~ d_rf + I(d_rf^0.5),
data = l@data, weights = All, family = quasipoisson(link = "log"))
The results of these regression models can be seen using
summary(). Surprisingly, Euclidean distance was a better
predictor of walking than route distance, but no strong conclusions can
be drawn from this finding, with such a small sample of desire lines (n
= 42). The results are purely illustrative, of the possibilities created
by using stplanr in conjunction with R’s modelling capabilities
(see Figure 8).
plot(l$d_euclidean, l$pwalk, cex = l$All / 50,
xlab = "Euclidean distance (m)", ylab = "Proportion of trips by foot")
l2 <- data.frame(d_euclidean = 1:5000, d_rf = 1:5000)
lm1p <- predict(lm1, l2)
lm2p <- predict(lm2, l2)
lm3p <- predict(lm3, l2)
lines(l2$d_euclidean, lm1p)
lines(l2$d_euclidean, exp(lm2p), col = "green")
lines(l2$d_euclidean, exp(lm3p), col = "red")
Visualization is an important aspect of any transport study, as it enables researchers to communicate their findings to other researchers, policy-makers, and, ultimately, the public. It may therefore come as a surprise that stplanr contains no functions for visualisation. Instead, users are encouraged to make use of existing spatial visualisation tools in R, such as tmap, leaflet, and ggmap (Cheshire and Lovelace 2015; Kahle and Wickham 2013).
Furthermore, with the development of online application frameworks, such as shiny, it is now easier than ever to make the results of transport analysis and modelling projects available to the public. There is great potential to expand on the principle of publicly accessible transport planning tools via "web apps", perhaps through new R packages dedicated to visualising and communicating transport data.
This paper demonstrates that transport planning can be done with R. We set out reasons for using open source software in general and command-line interfaces in particular for reproducibility, transparency, and democratic accountability. This is, to our knowledge, the first R package aimed at enabling practitioners and researchers to design more sustainable transport systems. Much more is possible in this direction.
The stplanr package focuses on geographic analysis of transport systems, a niche that is apparent from existing tools. Proprietary software such as INRO, PTV Visum, and TransCAD dominate geospatial methods in applied transport planning. This differs from other fields that use geospatial methods such as spatial epidemiology and ecology, where open source software dominate. Furthermore, the successful "UNIX approach" to software development implies that each piece of software does a limited number of things well. To avoid the pitfall of "feature creep" it makes sense to specialise.
The lack of existing packages that do the same job direct our
attention towards stplanr’s basic functionality rather than
more advanced features. A good example of this is the
SpatialLinesNetwork class which could provide a basis for
future developments including the recently implemented
route_local function. This direction of "incremental
growth" could proceed by adding interfaces to more transport planning
APIs, including Routino and Directions provided by the
mapping company Mapbox.
Another example of the potential to develop basic functionality
rather than new advanced features is the the possibility of creating an
"OD" class to represent origin-destination data, which
could ease the geographical analysis of OD data via generic methods.
Building on the example data presented in previous sections the idea is
for a single OD object to be able represent
flowlines, routes_fast and
routes_quiet. The recent development of the sf,
with its support for multiple geometry columns, makes this option more
feasible.
Another direction of travel is towards better support for large
datasets. We have demonstrated stplnr’s capabilities on
deliberately small datasets for ease of understanding. However, the
third criterion of transport planning software set-out in the
introduction — scalability — is vital to package’s real world
utility in the age of transport-related "Big Data" (Robin Lovelace et al. 2016). Some functions use
C++ (via Rcpp) for
computationally intensive operations, and we plan to continue the drive
toward performant code. Opportunities for improving performance include:
integration with transport databases (e.g., via the bikedata
package); support for batch routing APIs; parallel implementations of
time-consuming functions; and the refactoring of slow functions (e.g.,
overline, which takes hours to run on large datasets).
Rather than a one-size-fits-all package, we see stplanr as an additional tool in the transport planner’s cabinet. It is part of a wider movement that is making transport planning a more open and democratic process. Other developments in this movement include the increasing availability of open data (Naumova 2016) and the rise of open source products for transport modelling, such as SUMO, MATSim, MITSIMLAB (Saidallah, El Fergougui, and Elalaoui 2016), and pgRouting. In this context, stplanr’s focus on GIS operations represents a niche in the market, allowing it to complement such software and help make better use of new open data sources. The stplanr package could also be used alongside other R packages that use spatial transport data such as aspace and MCI (Wieland 2017).
The stplanr package was first developed to generate data for the Propensity to Cycle Tool (PCT), which estimates cycling potential down to the street level across all major cyclable routes in England and Wales (Robin Lovelace et al. 2017). We believe there are many other national and international transport planning challenges the package could be used to solve. In the context of the increasing availability of open access data6 packages for analyzing transport datasets. The stplanr package could help ensure that the evidence on which transport planning decisions are based is reproducible, systematic, and therefore democratically accountable.
We would like to thank a number of people and organisations who have supported the development of stplanr: the UK’s Department for Transport (DfT) for funding the Propensity to Cycle Tool (see www.pct.bike), which instigated code which eventually became stplanr; Colin Gillespie, whose teaching helped turn the code into a package; ROpenSci, for hosting the package (special thanks to Scott Chamberlin who reviewed the package); to Maëlle Salmon, Mark Padgham, Martin Lucas-Smith, and Marcus Young for commenting on early drafts of this paper; and to transport planning professionals Tom van Vuren (Mott MacDonald), John Parkin (University of the West of England), Helen Bowkett (Welsh Government), and Yaron Hollander, who provided input on the paper and ideas for future priorities.
Many people can think of things that could be improved on their local transport networks, especially for walking, cycling and wheel-chairs, but most lack the evidence to communicate the issues, and potential solutions, to others.↩︎
Note the inner_join function from the
dplyr package was used because it is substantially faster and
more effective than the equivalent function, merge, in base
R. This is communicated in a number of places, including on the website
zevross.com.
For this reason we import dplyr and use it internally for
joins.↩︎
The figure is based on the shortest path, although other criteria could be used by setting weights in the network accordingly — see https://github.com/ropensci/stplanr/issues/194 in the package’s issue tracker for details.↩︎
Such geographic datasets are best represented as in a spatial class system, explaining stplanr’s close integration with R’s spatial packages.↩︎
An API key is needed for this function to work. This can
be requested (or purchased for large scale routing) from cyclestreets.net/api/apply.
See ?route_cyclestreet for details. Thanks to Martin
Lucas-Smith and Simon Nuttall for making this possible. Note: an updated
interface to cyclestreets.net
has been made available via the package cyclestreets.↩︎
Two examples of this are the osmdata CRAN package (Padgham et al. 2017) and the Python package OSMnx (Boeing 2017), for downloading and processing open access datasets from OpenStreetMap.↩︎