Abstract
There is a strong software engineering culture in the R developer community. We recommend creating, updating and vetting packages as well as keeping up with community standards. We invite contributions to the rOpenSci project, where participants can gain experience that will shape their work and that of their peers.
The R programming language was originally created for statisticians, by statisticians, but evolved over time to attract a “massive pool of talent that was previously untapped” (Hadley Wickham in Thieme (2018)). Despite the fact that most R users are academic researchers and business data analysts without a background in software engineering, we are witnessing a rapid rise in software engineering within the community. In this comment we spotlight recent progress in tooling, dissemination and support, including specific efforts led by the rOpenSci project. We hope that readers will take advantage of and participate in the tools and practices we describe.
The basic infrastructure for creating, building, installing, and checking packages has been in place since the early days of the R language. During this time (1998-2011), the barriers to entry were very high and access to support and Q&A for beginners were extremely limited. With the introduction of the (Wickham, Hester, and Chang 2021) package in 2011, the process of creating and updating packages became substantially easier. Documentation also became simpler to maintain. The (Wickham et al. 2021) package allowed developers to keep documentation in sync with changes in code, similar to the doxygen approach that was embraced in more mature languages. Combined with the rise in popularity of StackOverflow and the growth of rstats blogs, the number of packages on the Comprehensive R Archive Network (CRAN) skyrocketed from 400 new packages in 2010 to 1000 new packages by 2014. As of this writing, there are nearly 19k packages on CRAN.
For novices without substantial software engineer experience, the early testing frameworks were also difficult to use. With the release of (Wickham 2011), testing also became smoother. There are now several actively maintained testing frameworks such as (van der Loo 2020); as well as testthat-compatible specialized tooling for testing database interactions ( (Keane and Vargas 2020)), web resources ( (Chamberlain 2021)), (Richardson 2021), and (Csárdi 2021) which enables the use of an embedded C/C++ web server for testing HTTP clients like (Wickham 2021)).
The testthat package has recently been improved with snapshot tests that make it possible to test plot outputs. The rOpenSci project has released (Padgham 2021), a package that supports automatic mutation testing.
Beyond checking for compliance with R CMD CHECK, several other packages such as (Csárdi and Frick 2018), (R Validation Hub et al. 2021), rOpenSci’s (Padgham and Salmon 2021) check packages against a large list of actionable, community recommended best practices for software development. Collectively these tools allow domain researchers to release software packages that meet high standards for software engineering.
The development and testing ecosystem of R is rich and has sometimes borrowed successful implementations from other languages (e.g. the vcr R package is a port, i.e. translation to R, of the vcr Ruby gem; testthat snapshot tests were inspired by JS Jest1).
As underlined in Thieme (2018), community is the strong suit of the R language. Many organizations and venues offer dedicated support for package developers. Examples include Q&A on the r-package-devel mailing list2, and the package development category of the RStudio community forum3, and the rstats section of StackOverflow4. Traditionally, R package developers have been mostly male and white. Although the status quo remains similar, efforts from groups such as R-Ladies5 meetups, Minorities in R (Scott and Smalls-Perkins 2020), and the package development modules offered by Forwards for underrepresented groups6 have made considerable inroads towards improving diversity. These efforts have worked hard to put the spotlight on developers beyond the “usual suspects”.
The rOpenSci organization (Boettiger et al.
2015) is an attractive venue for developers & supporters of
scientific R software. One of our most successful and continuing
initiatives is our Software Peer Review system (Ram et al. 2019), a combination of academic
peer-review and code review from industry. About 150 packages have been
reviewed by volunteers to date, creating better packages as well as a
growing knowledgebase in our development guide (rOpenSci et al. 2021) while also building a
living community of practice.
Our model has been the fundamental inspiration for projects such as the
Journal of Open Source Software (Smith et al.
2018), and PyOpenSci [Wasser and Holdgraf
(2019)](Trizna, Wasser, and Nicholson
2021). We are continuously improving our system and reducing
cognitive overload on editors and reviewers by automating repetitive
tasks. Most recently we have expanded our offerings to peer review of
packages that implement statistical methods (Statistical Software Peer
Review) (Padgham et al. 2021).
Beside software review, rOpenSci community is a safe, welcoming and
informative place for package developers, with Q&A happening on our
public forum and semi-open Slack workspace. (Butland and LaZerte 2020)
The aforementioned tools, venues and organizations benefit from and
support crucial dissemination efforts.
Publishing technical know-how is crucial for progress of the R
community. R news has been circulating on Twitter7, R Weekly8 and
R-Bloggers9. Some sources have been more specifically
aimed at R package developers of various experience and interests. While
“Writing R Extensions” 10 is the official & exhaustive
reference on writing R packages, it is a reference rather than a
learning resource: many R package developers, if not learning by
example, get introduced to R package development via introductory blog
posts or tutorials, and the R packages book by Hadley Wickham and Jenny
Bryan [Wickham (2015)](Wickham and Bryan, n.d.) that accompany the
devtools suite of packages is freely available online and strives to
improving the R package development experience. The rOpenSci guide
“rOpenSci Packages: Development, Maintenance, and Peer Review” (rOpenSci et al. 2021) contains our
community-contributed guidance on how to develop packages and review
them. It features opinionated requirements such as the use of
(Wickham et al. 2021) for package
documentation; criteria helping make an informed decision on
gray area topics such as limiting dependencies; advice on widely
accepted and emerging tools. As it is a living document also used
as reference for editorial decisions, we maintain a changelog11, and
summarize each release in a blog post12. rOpenSci also hosts
a book on a specialized topic, HTTP testing in R13, that presents both
principles for testing packages that interact with web resources, as
well as relevant packages. Beside these examples of long-form
documentation, knowledge around R software engineering is shared through
blogs and talks. In the R blogging world, the rOpenSci blog posts14,
technical notes15 and a section of our monthly newsletter16 feature
some topics relevant to package developers, as do some of the posts on
the Tidyverse blog17. The blog of the R-hub project18
contains information on package development topics, in particular about
common problems such as sharing data via R packages or understanding
CRAN checks. Expert programmers have been sharing their R specific
wisdom as well as software engineering lessons learned from other
languages (e.g. Jenny Bryan’s useR! Keynote address “code feels, code
smells”19).
In summary, we observe that there is already a strong software engineering culture in the R developer community. By surfacing the rich suite of resources to new developers we can but only hope the future will bring success to all aforementioned initiatives. We recommend creating, updating and vetting packages with the tools we mentioned as well as keeping up with community standards with the venues we mentioned in the previous section. We invite contributions to the rOpenSci project, where participants can gain experience that will shape their work and that of their peers. Thanks to these efforts, we hope the R community will continue to be a thriving place of application for software engineering, by diverse practitioners from many different paths.