Rejoinder: Software Engineering and R Programming

Abstract

It is a pleasure to take part in such fruitful discussion about the relationship between Software Engineering and R programming, and what could be gain by allowing each to look more closely at the other. Several discussants make valuable arguments that ought to be further discussed.

The Roles

It is worth arguing about the difference between research software engineers and software engineering researchers. While the former can be anyone developing scientific software for computation/data sciences (regardless of their technical background or "home" discipline), the latter are academics investigating software engineering in different domains.

Software engineering researchers aim to produce research that is translatable and usable by practitioners, and when investigating R programming (or any other type of scientific software) the "practitioners" are research software engineers. This distinction is relevant as one cannot work without the other. In other words, software engineering researchers ought to study research software engineers such like they study, e.g., a web developer, with the goal of uncovering their "pain points" and propose a solution to it. Likewise, research software engineers depend on software engineering researchers and expect them to produce the new knowledge they need.

However, what a research software engineer will vary by the programming language they use, and what they aim to achieve with it. In terms of R programming, as one discussant pointed, there can be a difference between an "R user" (which uses R to perform data analysis) and an "R developer" (which besides using the language, also develops it by creating publicly shared packages). However, to this extent, research has used both terms interchangeably, which leads to a possible avenue of work in terms of "human aspects of R programming".

The Software

This is where the next link appears–the tools and packages mentioned in the commentaries were developed with the intention of translating/migrating knowledge acquired/produced by software engineering researchers to the domain of R programming, and to be used by research software engineers. For example, the package covr streamlines the process of calculating the unit testing coverage of a package, and the original papers presenting such measures can be tracked down to the late "80s (Frankl and Weyuker 1988; DeMillo 1987). Albeit it is known coverage as a measure evolved and changed over time (and continues to do so), it is an excellent example of the outcome produced by software engineering researchers that successfully translated their findings to "practitioners" (in this case, research software engineers).

Therefore, a package is part of the "translation" of the knowledge acquired through software engineering research, into an accessible, usable framework. However, the tool itself is not enough–without the "environment" changing, growing, and learning, the tool may not be used to its full potential. Note that "environment" is used to refer (widely and loosely) to a person’s programming habits, acceptance to change, past experiences (e.g., time/effort spent in solving a bug, or domains worked on), and even the people around them (e.g., doing/not doing something because of what others do/do not do) that influence their vision, attitude and expectations regarding programming.

Moreover tools and packages are not finite, static elements–because they are software, they evolve. And when the requirements of a community change, so must do so the tools. This act as a reminder to not assign a "silver bullet" status to a tool meant to solve a particular, static problem, when it has been known that software (and thus the practices to develop it) evolve, and may even become unmanageable, never to be fully solved (Brooks 1987).

The Goal

Another related aspect is that "scientific software" has broader, different goals than "traditional" (namely, non-scientific) software development–it has been argued that "scientific software development" is concerned with knowledge acquisition rather than software production (Kelly 2015); e.g. a "tool" can be an RMarkdown document that allows performing an analysis (hence, using the language). Related to this, "scientific software" uses diverse paradigms, such as literate programming (which has been considered a programming paradigm for a few decades (Cordes and Brown 1991)) and scripting (which in turn, continues to elicit mixed stances from software engineering researchers (Loui 2008)) with goals different to "traditional software".

Thus, what "software engineering practices" mean for "scientific software" remains ambiguous, and some authors have argued that the "gap" between software engineering and scientific programming threatens the production of reliable scientific results (Storer 2017). The following are some example questions meant to illustrate how these other aspects of "scientific software" may still be related to software engineering practices:

Could text in a literate programming file be considered documentation? Is scripting subjected to code-smell practices like incorrect naming or code reuse? Does self-admitted technical debt exists in literate/scripting programming? What is the usability of a literate programming document? Should analytical scripts be meant for reuse?

The original article was intended to highlight some of the efforts made by software engineering researchers to bridge this gap of software engineering knowledge for "scientific programming". Nonetheless, software engineering researchers have perhaps focused more strongly on R packages because of their similarities to their current research (namely, "traditional software" development), thus making the translation of knowledge slightly more straightforward. Approaching other aspects, paradigms, tools and process of "scientific software" development still remains a gap on research that should be further studied.

The Community

The community is the next link in this chain–they motivate software engineering researchers" investigations, are the subjects, and the beneficiaries. Yet many times, they can also be the cause of their own "pain points". For example, research has shown that although StackOverflow is nowadays a staple for any programmer, many solutions derived from it can be outright insecure (Rahman, Farhana, and Imtiaz 2019; Fischer et al. 2017; Acar et al. 2016), have poor quality and code smells (T. Zhang et al. 2018; Meldrum et al. 2020), be outdated (H. Zhang 2020; Zerouali, Velázquez-Rodríguez, and De Roover 2021), or have low performance (Toro 2021), among others. This is but a facet of the concept of "there is no silver bullet" (Brooks 1987), and the only way of solving such situation (partially, and temporarily) is to look at it from multiple points of views. This action is what the original paper aimed to highlight.

Final words

In the end, the differences between software engineering researchers and research software engineers are blurry, and the translation of concepts from "traditional software" development/research to "scientific software" development/research may not be as straightforward as both groups of stakeholders consider. However, for the R community to continue evolving, both can (and should) work together and learn from the other.