Abstract
OpenStreetMap provides freely accessible and editable geographic data. The osmar package smoothly integrates the OpenStreetMap project into the R ecosystem. The osmar package provides infrastructure to access OpenStreetMap data from different sources, to enable working with the OSM data in the familiar R idiom, and to convert the data into objects based on classes provided by existing R packages. This paper explains the package’s concept and shows how to use it. As an application we present a simple navigation device.“OpenStreetMap creates and provides free geographic data such as street maps to anyone who wants them” announces the OpenStreetMap wiki main page (OSM Foundation 2011) – and we think R users want free geographic data. Therefore, the add-on package osmar (Schlesinger and Eugster 2012) provides extensible infrastructure for integrating the OpenStreetMap project (OSM) into the R project.
The aim of the OpenStreetMap project is to create a free editable map of the world. The project maintains a database of geographic elements (nodes, ways and relations) and features (such as streets, buildings and landmarks). These data are collected and provided by volunteers using GPS devices, aerial imagery, and local knowledge. The most prominent application is the rendering of the geographic data and features into raster images (for example, for the OSM map on the website). However, the project also provides an application programming interface (API) for fetching raw data from and saving to the OSM database.
The OpenStreetMap project provides data in the OSM XML format, which consists of three basic elements:
The basic element. It consists of the attributes latitude and longitude.
An ordered interconnection of nodes to describe a linear feature (e.g., a street). Areas (e.g., buildings) are represented as closed ways.
A grouping of elements (nodes, ways, and relations), which are somehow geographically related (e.g., bus and cycle routes).
Each element has further attributes like the element ID (unique within the corresponding element group) and timestamp. Furthermore, each element may have an arbitrary number of tags (key-value pairs) which describe the element. Ways and relations, in addition, have references to their members’ IDs.
In order to access the data, OSM provides an application programming interface (API) over the hypertext transfer protocol (HTTP) for getting raw data from and putting it to the OSM database. The main API (currently in version 0.6) has calls to get elements (and all other elements referenced by it) by, among other things, their ID and a bounding box. However, the requests are limited (e.g., currently only an area of \(0.25\) square degrees can be queried). An (unlimited) alternative is provided by planet files. These are compressed OSM XML files containing different OSM database extracts (e.g., the entire world or an individual country or area). Planet files can be downloaded from the OSM wiki and processed using the command-line Java tool Osmosis (Henderson 2011).
For a complete introduction into the OSM project, the OSM API, and the OSM XML file format we refer to the project’s wiki available at http://wiki.openstreetmap.org/.
The aim of the package osmar is to provide extensible infrastructure to get and to represent the above described OSM data within R, to enable working with the OSM data in the familiar R idiom, and to convert the OSM data to objects based on classes provided by other packages. Figure 1 visualizes the package’s concept. This is a different idea than existing packages like OpenStreetMap (Fellows 2012), RgoogleMaps (Loecher 2012), and ggmap (Kahle and Wickham 2012) follow. Whereas these packages provide access to the already rendered data (i.e., raster images), osmar enables the usage of the raw OSM data.
In the following section we present the package’s implementation and usage. Note that we try to increase readability by only showing the relevant arguments of plot statements. We refer to the “navigator” demo in the osmar package for the actual plot statements.
We begin with defining the data source. Currently two sources,
HTTP-API and planet files, are supported. In this section we use the API
of the OSM project with the default URL url =http://api.openstreetmap.org/api/0.6/:
> library("osmar")
Loading required package: XML
Loading required package: RCurl
Loading required package: bitops
Loading required package: gtools
Loading required package: geosphere
Loading required package: sp
Attaching package: 'osmar'
The following object(s) are masked
from 'package:utils':
find
> src <- osmsource_api()
We can retrieve elements by using the IDs of the elements. The IDs in these examples have been extracted by hand from the OpenStreetMap website (via its export functionality). For example, one node:
> get_osm(node(18961430), source = src)
osmar object
1 nodes, 0 ways, 0 relations
Or, one way with the way-related data only or with the data for all referenced elements (nodes and relations):
> get_osm(way(3810479), source = src)
osmar object
0 nodes, 1 ways, 0 relations
> get_osm(way(3810479), source = src, full = TRUE)
osmar object
11 nodes, 1 ways, 0 relations
The first statement retrieves the way only (because the default value
of the full argument is FALSE). The second
statement additionally retrieves all nodes that are members of the way
(i.e., all nodes that define the way).
The second possibility to retrieve elements is to specify a bounding
box by defining the left, bottom, right, and top coordinates
(corner_bbox()), or the center point and width and height
in meters (center_bbox()):
> bb <- center_bbox(174.76778, -36.85056, 700, 700)
> ua <- get_osm(bb, source = src)
> ua
osmar object
2427 nodes, 428 ways, 70 relations
The use of planet files via Osmosis as source works analogously. The
source is specified by the function osmsource_osmosis().
The function’s two arguments are the path to the planet file
(file) and the path to the osmosis tool
(osmosis = "osmosis"). Note that per default it is assumed
that the Osmosis executable is in your PATH environment
variable. The navigator example demonstrates the usage of planet
files.
The retrieved osmar object is a list with the three
elements nodes, ways, and
relations. Each element again is a list containing
data.frames for the attributes (the attrs list
element) and meta-data (the tags list element) of the OSM
elements. Ways and relations additionally have a data.frame
containing their members (the refs list element).
For each element nodes, ways, and
relations of an osmar object an individual
summary method is available. The overall summary method for an
osmar object shows the three individual summaries all at
once.
> summary(ua$nodes)
osmar$nodes object
2427 nodes, 771 tags
..$attrs data.frame:
id, lat, lon, user, uid, visible, version,
changeset, timestamp
..$tags data.frame:
id, k, v
Bounding box:
lat lon
min -36.85661 174.7627
max -36.84472 174.7753
Key-Value contingency table:
Key Value Freq
1 addr:city Auckland 101
2 addr:street Queen Street 61
3 addr:country NZ 40
4 addr:postcode 1010 39
5 comment Determined via Keypa... 29
6 addr:street Symonds Street 27
7 highway traffic_signals 23
8 addr:street Lorne Street 19
9 highway bus_stop 15
10 amenity cafe 11
In the case of the summary for nodes, the number of
elements and tags, as well as the available variables for each
corresponding data.frame are shown. The bounding box of the
coordinates and a contingency table of the top ten most frequently
available key-value pairs are printed.
The summaries for the other two elements ways and
relations are similar. Note that these methods in fact
return the contingency table of all available key-value pairs and, in
addition, further information which is not printed but may be useful for
a descriptive analysis. We refer to the help pages (e.g.,
?summary.nodes) for a detailed description of the return
values.
In order to find specific elements within the osmar
object, the find() function allows the object to be queried
with a given condition. As the basis of osmar objects are
data.frames, the condition principally is a logical
expression indicating the rows to keep. In addition, one has to specify
to which element (nodes, node(); ways, way();
or relations, relation()) and to which data (attributes,
attrs(); meta-data, tags(); or members,
refs()) the condition applies.
If, for example, we want to find all traffic signal nodes, we know
from the object’s summary that the corresponding value in the
attrs data.frame is "traffic_signals". We can
express this condition as follows:
> ts_ids <- find(ua, node(tags(v == "traffic_signals")))
> ts_ids
[1] 25769635 25769637 25769641 ...
The result is a vector with node IDs (in this case 25 traffic signal
nodes) or NA. If the correct spelling is unknown, the
defined binary operators %agrep% for approximate matches
(see ?agrep) and %grep% for pattern matches
(see ?grepl) can be used:
> bs_ids <- find(ua, node(tags(v %agrep% "busstop")))
> bs_ids
[1] 678301119 737159091 1318401034 ...
This returns \(15\) bus stops
available in the ua osmar object.
We use the functions find_down() and
find_up() to find all related elements for given element
IDs. The OSM basic elements define a hierarchy, \[\text{node} \leftarrow \text{way} \leftarrow
\text{relation,}\] and these two functions enable us to find the
related elements up and down the hierarchy. For example,
find_up() on a node returns all related nodes, ways, and
relations; find_down() on a node returns only the node
itself. On the other hand, find_up() on a relation returns
only the relation itself; find_down() on a relation returns
the relation and all related ways and nodes.
> hw_ids <- find(ua, way(tags(k == "highway")))
> hw_ids <- find_down(ua, way(hw_ids))
In this example we find all ways that have a tag with the
k attribute set to "highway". These contain
hardened and recognised land routes between two places used by motorised
vehicles, pedestrians, cyclists, etc. The return value of
find_down() and find_up() is a list containing
the element IDs:
> str(hw_ids)
List of 3
$ node_ids : num [1:1321] 25769641 ...
$ way_ids : num [1:253] 4309608 ...
$ relation_ids: NULL
The return value of the find functions then can be used to create
subsets of osmar objects. The subset() method
for osmar objects takes element IDs and returns the
corresponding data as osmar objects. For example, the two
subsets based on the traffic signal and bus stop element IDs are:
> ts <- subset(ua, node_ids = ts_ids)
> ts
osmar object
25 nodes, 0 ways, 0 relations
> bs <- subset(ua, node_ids = bs_ids)
> bs
osmar object
15 nodes, 0 ways, 0 relations
The subset based on the highway element IDs is:
> hw <- subset(ua, ids = hw_ids)
> hw
osmar object
1321 nodes, 253 ways, 0 relations
Note that the subsetting of osmar objects is divided
into the two steps “finding” and “subsetting” to have more flexibility
in handling the related elements (here with using
find_down() and find_up(), but more
sophisticated routines can be imagined).
The visualization of osmar objects is possible if nodes
are available in the object (as only these OSM elements contain latitude
and longitude information). The functions plot_nodes() and
plot_ways() plot the available nodes as dots and ways as
lines, respectively. The plot() method combines these two
function calls. Note that this is a plot of the raw data and no
projection is done (see the following section for a projected
visualization).
> plot(ua)
> plot_ways(hw, add = TRUE, col = "green")
> plot_nodes(ts, add = TRUE, col = "red")
> plot_nodes(bs, add = TRUE, col = "blue")
In order to use the complete power of R on OpenStreetMap data, it is
essential to be able to convert osmar objects into commonly
used objects based on classes provided by other packages. Currently,
osmar provides two converters – into the sp (Bivand, Pebesma, and Gomez-Rubio 2008) and the
igraph
(Csardi 2011) packages. In this section we
show the conversion to sp objects, the navigation device
example shows the conversion to igraph objects.
The sp package provides special data structures and utility
functions for spatial data. Spatial data classes are available for
points, lines, and polygons and others (see
Bivand, Pebesma, and Gomez-Rubio 2008). The osmar
package provides the as_sp() function,
> args(as_sp)
function(obj, what = c("points", "lines", "polygons"),
crs = osm_crs(), simplify = TRUE)
NULL
to convert an osmar object into the corresponding
classes for points, lines, and polygons in the sp package
(given the required data are available). Note that the appropriate WGS84
coordinate reference system (CRS) for OpenStreetMap data is used
(cf. osm_crs()).
Polygons are used to represent areas, which OpenStreetMap represents
as closed ways. Buildings, for example, are closed ways and can be
converted to an sp polygon object:
> bg_ids <- find(ua, way(tags(k == "building")))
> bg_ids <- find_down(ua, way(bg_ids))
> bg <- subset(ua, ids = bg_ids)
> bg
osmar object
991 nodes, 110 ways, 0 relations
> bg_poly <- as_sp(bg, "polygons")
The result is a SpatialPolygonsDataFrame with the
osmar object’s attributes (the attrs element)
as its data. Functionality provided by the sp package can now
be used to analyze the OSM data; e.g., the summary() method
or the spplot() method—the latter one, for example, to see
how often each building was modified:
> spplot(bg_poly, c("version"))
Meta-data (the tags element) and members (the
refs element) are not automatically converted to a
SpatialPolygonsDataFrame’s data.
The conversion of lines and points works similarly:
> hw_line <- as_sp(hw, "lines")
> bs_points <- as_sp(bs, "points")
The results are SpatialLinesDataFrame and
SpatialPointsDataFrame objects, respectively.
In order to finalize the University of Auckland example we create a
bus route map and visualize the available bus routes belonging to the
bus stops. Therefore, we find all bus relations available
in the object, retrieve the corresponding data from the OSM API, and
convert the data into lines (note that this computation takes some
time):
> bus_ids <- find(ua, relation(tags(v == "bus")))
> bus <- lapply(bus_ids,
+ function(i) {
+ raw <- get_osm(relation(i), full = TRUE)
+ as_sp(raw, "lines")
+ })
We use the argument full = TRUE to retrieve the relation
itself and all related members. In detail, this means we retrieve all
nodes, ways, and relations that are members of the specified relation;
and, recursively, all nodes that are members of the retrieved ways.
We then use the sp plot methods to create the final bus route map:
> plot(bg_poly, col = "gray")
> plot(hw_line, add = TRUE, col = "green")
> plot(bs_points, add = TRUE, col = "blue")
> for ( i in seq(along = bus) ) {
+ plot(bus[[i]], add = TRUE, col = "blue")
+ }
.
The osmar package extends the R ecosystem with
infrastructure to work together with the OpenStreetMap project. So far,
functionality is available to get data from different sources (e.g.,
planet file and API v0.6), to consolidate the data as an R
osmar object, to work with the osmar object
(e.g., subsetting and plotting), and to convert it to objects based on
classes provided by other packages (e.g., to igraph and
sp objects).
Future work includes the implementation of further converters; e.g.,
a converter from osmar objects to raster image objects via
the OpenStreetMap package. We are also interested in
implementing converters from objects provided by other R packages to
osmar objects and in saving these objects in different
OpenStreetMap sources via a put_osm() function. This would
be, in fact, the completion of the osmar concept illustrated in
Figure 1 with arrows from the right to
the left blocks.
Furthermore, we would like to incorporate tools originated in the OpenStreetMap ecosystem. One idea is the implementation of a rule-based rendering tool for generating SVG images of OSM data along the lines of Osmarender (Jochen Topf 2011). Another interesting project is Osmium, a fast and flexible C++ and Javascript toolkit and framework for working with OSM data (J. Topf 2012). An R interface [potentially via Rcpp modules; Eddelbuettel and François (2011)] would provide a very fast and flexible way to work with large OSM data sets.
The authors thank two anonymous reviewers and Joe Sakshaug for their constructive comments to improve the manuscript.