Abstract
Our Google Summer of Code project “Improvements to
nls()” investigated rationalizing R tools for nonlinear
regression and nonlinear estimation tools by considering usability,
maintainability, and functionality, especially for a Gauss-Newton
solver. The rich features of
nls() are weakened by several deficiencies and
inconsistencies such as a lack of stabilization of the Gauss-Newton
solver. Further considerations are the usability and maintainability of
the code base that provides the functionality nls() claims
to offer. Various packages, including our nlsr, provide
alternative capabilities. We consider the differences in goals,
approaches, and features of different tools for nonlinear least squares
modeling in R. Discussion of these matters is relevant to improving R
generally as well as its nonlinear estimation tools.
nls() functionnls() is a Comprehensive R Archive Network (CRAN: https://cran.r-project.org) tool that has
remarkable and wide-ranging features for estimating nonlinear
statistical models that are expressed as formulas.
The function dates to the 1980s
and the work related to D. M. Bates and Watts
(1988) in S (see https://en.wikipedia.org/wiki/S_%28programming_language%29).
In particular, we note that it
selfStart) models that do not
require initial parameter values.Due to its extensive range of features and prolonged history, the code has become untidy and overly patched, making it challenging to maintain and ripe for improvement in its underlying methods.
Besides the base-R nls() function, we will pay
particular attention to nlsr (John C. Nash and Murdoch (2023)),
minpack.lm (Elzhov et al.
(2012)), and gslnls (Chau
(2023)) which are general nonlinear least-squares solvers in the
CRAN repository. While we will provide capsule comments for some other
CRAN packages, those in the Bioconductor (Gentleman et al. (2004)) collection are more
specialized, and those on repositories such as GitHub (https://github.com) and Gitlab
(https://about.gitlab.com), while interesting, do not
have the checking applied to CRAN packages.
Our work aimed at unifying nonlinear modeling functionality in R,
ideally in a refactored
nls() function. The primary messages from this work
are:
For R users, we would advise that it is most
efficient to carry out nonlinear modeling or least squares by adapting
working scripts, preferably those
with documentation and using recent tools. If there is a suspicion that
there may be ill-conditioning, package nlsr or the example
we give in the section “Comparison notes for formula-setup solutions”
below of how to find singular values of the Jacobian allow these
diagnostics to be calculated.
For R developers, we invite and encourage discussion of the design choices, since these have downstream implications for ease of use, adaptation to new features, and efficiency of ongoing maintenance.
onls (Spiess, Andrej-Nikolai
(2022)) is used for optimising and estimating nonlinear models by
minimizing the sum of squares of orthogonal residuals
rather than vertical residuals. The objective is therefore different and
involves nontrivial extra calculation. A vignette with the package and
the blog article https://www.r-bloggers.com/2015/01/introducing-orthogonal-nonlinear-least-squares-regression-in-r/
give some description with illustrative graphs. onls
appears to be limited to problems with one independent and one dependent
variable. The Wikipedia article https://en.wikipedia.org/wiki/Total_least_squares
presents an overview of some
ideas, with references to the literature. The approach needs a wider
discussion and tutorial examples to allow its merits to be judged than
can be included here.
crsnls (Tvrdík (2016)) –
This package allows nonlinear estimation by controlled random search via
two methods. There is unfortunately no vignette. A modest trial we
carried out showed nlsr::nlxb() gave the same results in a
small fraction of the time required by either of the methods in
crsnls. The method discussed in Josef Tvrdík and Ivan Křivý and Ladislav Mišík
(2007) claims better reliability in finding solutions than a
Levenberg-Marquardt code (actually from Matlab), but the tests were
conducted on the extreme NIST examples mentioned next.
NISTnls (D. M. Bates
(2012)) – This package provides R code and data for a set of
(numerically ill-conditioned) nonlinear least squares problems from the
U.S. National Institute for Standards and Technology. These may not
represent real-world situations.
nlshelper (Duursma (2017))
– This package, which unfortunately lacks a vignette, provides a few
utilities for summarizing, testing, and plotting non-linear regression
models estimated with nls(), nlsList() or nlme() that are linked or
grouped in some way.
nlsic (Sokol (2022)) –
This solves nonlinear least squares problems with optional equality
and/or inequality constraints. It is clearly not about modeling, and the
input and output are quite different from class nls
methods. However, there do not appear to be other R packages with these
capabilities.
nlsMicrobio (Baty and
Delignette-Muller (2014)) – Data sets and nonlinear regression
models dedicated to predictive microbiology, including a vignette, by
authors of the nlstools package.
nlstools (Baty and
Delignette-Muller (2013)) – This package provides several tools
for aiding the estimation of nonlinear models, particularly using
nls(). The vignette is actually a journal article, and the
authors have considerable experience in the subject.
nlsmsn (Prates, Lachos, and Garay
(2021)) – Fit univariate non-linear scale mixture of skew-normal
(NL-SMSN) regression, with details in Garay, Lachos, and Abanto-Valle
(2011). The problem here is to minimize an objective that is modified
from the traditional sum of squared residuals.
nls.multstart (Padfield and
Matheson (2020)) – Non-linear least squares regression using AIC
scores with the Levenberg-Marquardt algorithm using multiple starting
values for increasing the chance that the minimum found is the global
minimum.
nls2 (Grothendieck (2022))
– Nonlinear least squares by brute force has similar motivations to
nls.multstart, but uses nls() within multiple
trials. The author has extensive expertise in R.
nlstac (Rodriguez-Arias et al.
(2020)) – A set of functions implementing the algorithm described
in Torvisco, Rodriguez-Arias, and Sanchez
(2018) for fitting separable nonlinear regression curves. The
special class of problem for which this package is intended is an
important and difficult one. No vignette is provided, unfortunately.
easynls – Fit and plot some nonlinear models. Thirteen
models are treated, but there is minimal documentation and no vignette.
Package nlraa is to be preferred.
nlraa (Miguez (2021)) – a
set of nonlinear selfStart models, primarily from agriculture.
Most include analytic Jacobian code.
optimx (John C. Nash and Varadhan
(2011)) – This provides optimizers that can be applied to
minimize a nonlinear function which could be a nonlinear sum of squares.
Not generally recommended if nonlinear least squares programs can be
easily used, but provides a check and alternative solvers.
The Hobbs weed infestation problem (John C. Nash (1979, 120)) is a growth-curve modeling task. Its very succinct statement provides the “short reproducible example” much requested on R mailing lists. The problem is small and seemingly straightforward, yet presents such difficulties that optimization researchers have asked if it is contrived rather than a real problem from a field researcher. The data and graph follow.
weed <- c(5.308, 7.24, 9.638, 12.866, 17.069, 23.192, 31.443,
38.558, 50.156, 62.948, 75.995, 91.972)
tt <- 1:12
weeddf <- data.frame(tt, weed)
plot(weeddf, main="Hobbs weed infestation data")
While model estimation is increasingly automated, nonlinear regression should use background knowledge and graphs to obtain an understanding of the general magnitude of the parameters. Indeed, the “visual fitting” approach (John C. Nash and Velleman (1996)) mentioned later accords with this viewpoint, as does use of the Logistic3T variant of the model, as well as discussions in Ross (1990), Seber and Wild (1989), D. M. Bates and Watts (1988), and Gallant (1987). Our emphasis on the software resilience to starting parameters that are, in the sense of statistical modeling, “silly” comes from over half a century of dealing with users whose interests and understanding are very far from those of statistical modelers. Thus, we seek methods and codes that obtain reasonable answers under highly unfavourable conditions. Nevertheless, a proper approach to nonlinear modeling is to apply all available knowledge to the task.
Three suggested models for this data are (with names to allow for easy reference)
Logistic3U: \[ y \approx b_1 / (1 + b_2 * exp(- b_3 * t)) \]
Logistic3S: \[ y \approx 100 * c_1 / (1 + 10 * c_2 * exp(- 0.1 * c_3 * t)) \]
Logistic3T: \[ y \approx Asym / (1 + exp((xmid - t)/scal)) \]
where we will use weed for \(y\) and tt for \(t\). The functions above are equivalent,
but the first is generally more awkward to solve numerically due to its
poor scaling. The parameters of the three forms are related as
follows:
\[ Asym = b_1 = 100 * c_1 \] \[ exp(xmid/scal) = b_2 = 10 * c_2 \] \[ 1/scal = b_3 = 0.1*c3 \] To allow for a simpler discussion, let us say that the parameters form a (named) vector \(p\) and the model function is called \(model(p)\) with \(r = y - model(p)\).
We wish to minimize the sum of squared residuals, which is our loss (or objective) function. Starting with some guess for the parameters, we aim to alter these parameters to obtain a smaller loss function. We then iterate until we can make no further progress.
Let us consider there are \(n\) parameters and \(m\) residuals. The loss function is \[ S(p) = r' r = \sum_{i=1}^m { r_i^2 }\] The gradient of \(S(p)\) is \[ g = 2 * J' r\] where the Jacobian \(J\) is given by elements \[ J_{i,j} = \partial r_i / \partial p_j \] and the Hessian is defined by elements \[ H_{i,j} = \partial ^2 S(p) / {\partial p_i \partial p_j} \] If we expand the Hessian for nonlinear least squares problems, we find \[ 0.5 * H_{i,j} = \sum_{k = 1}^m{ J_{k,i} J_{k,j}} + \sum_{k = 1}^m {r_k * \partial{r_k}/{\partial p_i \partial p_j}}\] Let us use \(D_{i,j}\) for the elements of the second term of this expression. What is generally called Newton’s method for function minimization tries to set the gradient to zero (to find a stationary point of the function \(S(p)\)). This leads to Newton’s equation \[ H \delta = -g \] Given a set of parameters \(p\), we solve this equation for \(\delta\), adjust \(p\) to \(p + \delta\) and iterate, hopefully to converge on a solution. Applying this to a sum of squares problem gives \[ 0.5 * H \delta = (J'J + D) \delta = - J' r\] In this expression, only the elements of \(D\) have second partial derivatives. Gauss, attempting to model planetary orbits, had small residuals, and noted that these multiplied the second partial derivatives of \(r\), so he approximated \[ 0.5 * H \approx J' J\] by assuming \(D \approx 0\). This results in the Gauss-Newton method where we solve \[ J' J \delta = - J' r\] though we can avoid some loss of accuracy by not forming the inner product matrix \(J' J\) and solving the linear least squares matrix problem \[ J \delta \approx -r \] by one of several matrix decomposition methods.
In reality, there are many problems where \(D\) should not be ignored, but the work to compute it precisely is considerable. Many work-arounds have been proposed, of which the Levenberg-Marquardt stabilization (Levenberg (1944), Marquardt (1963)) is the most commonly used. For convenience, we will use “Marquardt”, as we believe he first incorporated the ideas into a practical computer program.
The usual suggestion is that \(D\) be replaced by a multiple of the unit matrix or else a multiple of the diagonal part of \(J' J\). In low precision, some elements of \(J' J\) could underflow to zero (John C. Nash (1977)), and a linear combination of both choices is an effective compromise. Various choices for \(D\), as well as a possible line search along the direction \(\delta\) rather than a unit step (Hartley (1961)), give rise to several variant algorithms. “Marquardt’s method” is a family of methods. Fortunately, most choices work well.
Let us specify in R the three model formulas and set some starting values for parameters. These starting points are not equivalent and are deliberately crude choices. Workers performing many calculations of a similar nature should try to provide good starting points to reduce computation time and avoid finding a false solution.
# model formulas
frmu <- weed ~ b1 / (1 + b2 * exp(-b3 * tt))
frms <- weed ~ 100 * c1/(1 + 10* c2* exp(-0.1 * c3* tt))
frmt <- weed ~ Asym / (1 + exp((xmid - tt) / scal))
#
# Starting parameter sets
stu1 <- c(b1 = 1, b2 = 1, b3 = 1)
sts1 <- c(c1 = 1, c2 = 1, c3 = 1)
stt1 <- c(Asym = 1, xmid = 1, scal = 1)
One of the useful features of nls() is the possibility
of a selfStart model, where starting parameter values are
not required. However, if a selfStart model is not
available, nls() sets all the starting parameters to 1.
This is tolerable but could be improved by using a set of values that
are all slightly different, which, in the case of the example model
\(y \,\sim\, a \,*\, exp(-b \,*\, x) +
c\,*\,exp(-d \,*\, x)\) would avoid a singular Jacobian because
\(b\) and \(d\) were equal in value. Program
modifications to give a sequence like 1.0, 1.1, 1.2, 1.3 for the four
parameters are fairly obvious.
It is also possible to provide R functions for the residual and Jacobian. This is usually much more work for the user if the formula setup is possible. To illustrate, we show the functions for the unscaled 3 parameter logistic. The particular form of these explicit residual and Jacobian functions comes from their translation from BASIC codes of the 1970s, as adapted in John C. Nash and Walker-Smith (1987). The use of the Laplace form of the residual and the inclusion of data within the functions reflects choices of that era that are at odds with current practice. Some users still want or need to provide problems as explicit functions, particularly for problems that are not regressions. For example, the Rosenbrock banana-valley test problem can be provided this way, where the two “residuals” are different functional forms.
# Logistic3U
hobbs.res <- function(x){ # unscaled Hobbs weeds problem -- residual
if(length(x) != 3) stop("hobbs.res -- parameter vector n!=3")
y <- c(5.308, 7.24, 9.638, 12.866, 17.069, 23.192, 31.443,
38.558, 50.156, 62.948, 75.995, 91.972)
tt <- 1:12
res <- x[1] / (1 + x[2] * exp(-x[3] * tt)) - y
# Note: this form of the residual, from Laplace (1788) in the form
# of "fitted - observed" has been used in our software for half
# a century, but sometimes concerns users of the more traditional
# "observed - fitted" form
}
hobbs.jac <- function(x) { # unscaled Hobbs weeds problem -- Jacobian
jj <- matrix(0.0, 12, 3)
tt <- 1:12
yy <- exp(-x[3] * tt)
zz <- 1.0 / (1 + x[2] * yy)
jj[tt, 1] <- zz
jj[tt, 2] <- -x[1] * zz * zz * yy
jj[tt, 3] <- x[1] * zz * zz * yy * x[2] * tt
attr(jj, "gradient") <- jj
jj
}
Using a formula specification was a principal advantage made with
nls() when it became available in S sometime in the 1980s.
It uses a Gauss-Newton (i.e., unstabilized) iteration with a step
reduction line search. This works very efficiently as long as \(J\) is not ill-conditioned. Below we see
nls() does poorly on the example problem. To save page
space, we use 1-line result display functions from package
nlsr, namely pnls() and
pshort().
## Error in nls(formula = frmu, start = stu1, data = weeddf) :
## singular gradient
## Error in nls(formula = frms, start = sts1, data = weeddf) :
## singular gradient
## Error in nls(formula = frmt, start = stt1, data = weeddf) :
## singular gradient
Here we see the infamous “singular gradient” termination message of
nls().
Users
should, of course, be using the SSlogis selfStart
tool, but ignorance of this possibility or a slight variant in the model
can easily lead to outcomes similar to those seen here.
## unlx1:residual sumsquares = 2.5873 on 12 observations
## after 19 Jacobian and 25 function evaluations
## name coeff SE tstat pval gradient JSingval
## b1 196.186 11.31 17.35 3.167e-08 -4.859e-09 1011
## b2 49.0916 1.688 29.08 3.284e-10 -3.099e-08 0.4605
## b3 0.31357 0.006863 45.69 5.768e-12 2.305e-06 0.04714
## snlx1 -- ss= 2.5873 : c1 = 1.9619 c2 = 4.9092 c3 = 3.1357; 34 res/ 23 jac
## tnlx1 -- ss= 2.5873 : Asym = 196.19 xmid = 12.417 scal = 3.1891; 36 res/ 27 jac
Though we have found solutions, the Jacobian is essentially singular
as shown by its singular values. Note that these are displayed by
package nlsr in a single column in the output to provide a
compact layout, but the values do not correspond to the individual
parameters in whose row they appear; they are a property of the whole
problem.
## unlm1 -- ss= 2.5873 : b1 = 196.19 b2 = 49.092 b3 = 0.31357; 17 itns
## snlm1 -- ss= 2.5873 : c1 = 1.9619 c2 = 4.9092 c3 = 3.1357; 7 itns
## Error in nlsModel(formula, mf, start, wts) :
## singular gradient matrix at initial parameter estimates
## ugslnls1 -- ss= 2.5873 : b1 = 196.19 b2 = 49.092 b3 = 0.31357; 26 itns
## sgslnls1 -- ss= 2.5873 : c1 = 1.9619 c2 = 4.9092 c3 = 3.1357; 11 itns
## tgslnls1 -- ss= 9205.4 : Asym = 35.532 xmid = 20892 scal = -1147.4; 58 itns
nlsr::nlxb() uses print() to output
standard errors and singular values of the Jacobian (for diagnostic
purposes). By contrast, minpack.lm::nlsLM() and
nls() use summary(), which does not display
the sum of squares, while print() gives the sum of squares,
but not the standard error of the residuals.
The singular values allow us to gauge how “nearly singular” the Jacobian is at the solution, and the ratio of the smallest to largest of the singular values is a simple but effective measure. The ratios are 4.6641e-05 for Logistic3U, 0.021022 for Logistic3S, and 0.001055 for Logistic3T, so Logistic3S is the “least singular”.
The results from nlsLM and gsl_nls for the
transformed model Logistic3T have a very large sum of squares, which may
suggest that these programs have failed. Since nls(),
nlsLM(), and gsl_nls() do not offer singular
values, we need to extract the Jacobian and compute its singular values.
The following script shows how to do this, using as Jacobian what is
called the gradient element in the returned solution for
these solvers.
# for nlsLM
if (inherits(tnlm1, "try-error")) {
print("Cannot compute solution -- likely singular Jacobian")
} else {
JtnlsLM <- tnlm1$m$gradient() # actually the Jacobian
svd(JtnlsLM)$d # Singular values
}
## [1] "Cannot compute solution -- likely singular Jacobian"
# for gsl_nls
if (inherits(tgslnls1, "try-error")) {
cat("Cannot compute solution -- likely singular Jacobian")
} else {
JtnlsLM <- tgslnls1$m$gradient()
svd(JtnlsLM)$d # Singular values
}
## [1] 3.4641e+00 9.3067e-10 4.7032e-25
We see that there are differences in detail, but the more important
result is that two out of three singular values are essentially 0. Our
Jacobian is singular, and no method of the Gauss-Newton type should be
able to continue. Indeed, from the parameters reported at this saddle
point, nlsr::nlxb() cannot proceed.
stspecial <- c(Asym = 35.532, xmid = 43376, scal = -2935.4)
badstart <- try(nlxb(formula = frmt, start = stspecial, data = weeddf))
if (! inherits(badstart, "try-error")) print(badstart)
## residual sumsquares = 9205.4 on 12 observations
## after 2 Jacobian and 2 function evaluations
## name coeff SE tstat pval gradient JSingval
## Asym 35.5321 NA NA NA -9.694e-09 3.464
## xmid 43376 NA NA NA -1.742e-09 2.61e-10
## scal -2935.4 NA NA NA -2.4e-08 7.12e-16
We illustrate how to solve nonlinear least squares problems using a
function to define the residual. Note that gsl_nls()
requires a vector y that is the length of the vector
returned by the function supplied, e.g., hobbs.res().
gsl_nls uses a numerical approximation for the Jacobian if
the argument jac is missing. Note function
nlsr::pnlslm() for a 1-line display of the results of
minpack.lm::nls.lm().
## hobnlfb<-nlfb(start=stu1, resfn=hobbs.res, jacfn=hobbs.jac)
## hobnlfb -- ss= 2.5873 : b1 = 196.19 b2 = 49.092 b3 = 0.31357; 25 res/ 19 jac
## hobnlm<-nls.lm(par=stu1, fn=hobbs.res, jac=hobbs.jac)
## hobnlm -- ss= 2.5873 : b1 = 196.19 b2 = 49.092 b3 = 0.31357; 17 itns
## hobgsln<-gsl_nls(start=stu1, fn=hobbs.res, y=rep(0,12))
## hobgsln -- ss= 2.5873 : b1 = 196.19 b2 = 49.092 b3 = 0.31357; 26 itns
## hobgsl<-gsl_nls(start=stu1, fn=hobbs.res, y=rep(0,12), jac=hobbs.jac)
## hobgsl -- ss= 2.5873 : b1 = 196.19 b2 = 49.092 b3 = 0.31357; 26 itns
The output object of nlxb() is smaller than the class
nls object returned by nls(),
nlsLM(), and gsl_nls(). Package
nlsr emphasizes the solution of the nonlinear least squares
problem rather than the estimation of a nonlinear model that fits or
explains the data. The object of class nls allows for a
number of specialized modeling and diagnostic extensions. For
compatibility, the nlsr package has the function
wrapnlsr(), for which nlsr() is an alias. This
uses nlxb() to find good parameters, then calls
nls() to return the class nls object. Unless
particular modeling features are needed, the use of
wrapnlsr() is unnecessary and wasteful of resources.
The design goals of the different tools may also be revealed in the
so-called “convergence tests” for the iterative solvers. In the manual
page for nls() in R 4.0.0, there was the warning:
> Do not use nls on artificial “zero-residual”
data
with the suggested addition
of small perturbations to the data. This admits nls() could
not solve well-posed problems unless data is polluted with errors.
Zero-residual problems are not always artificial, since problems in
function approximation and nonlinear equations can be approached with
nonlinear least squares. Fortunately, a small adjustment to the
“termination test” for the program, rather than for the “convergence” of
the underlying algorithm, fixes the defect. The test is the Relative
Offset Convergence Criterion (see D. M. Bates and
Watts (1981)). This scales an estimated reduction in the loss
function by its current value. If the loss function is very small, we
are close to a zero-divide. Adding a small quantity to the divisor
avoids trouble. In 2021, one of us (J. Nash) proposed that
nls.control() have an additional parameter
scaleOffset with a default value of zero. Setting it to a
small number – 1.0 is a reasonable choice – allows small-residual
problems (i.e., near-exact fits) to be dealt with easily. We call this
the safeguarded relative offset convergence criterion,
and it has been in nlsr since it was introduced. The
default value gives the legacy behavior. This improvement has been in
the R distributed code since version 4.1.0.
Additional termination tests can be used. nlsr has a
small sum of squares test
(smallsstest) that compares the latest evaluated sum of
squared (weighted) residuals to e4 times the initial sum of
squares, where e4 <- (100*.Machine$double.eps)^4 is
approximately 2.43e-55.
Termination after what may be considered excessive computation is
also important. nls() stops after maxiter
“iterations”. The meaning of “iteration” may require an examination of
the code for the different algorithms. nlsr terminates
execution when the number of residual or Jacobian evaluations exceed set
limits. Generally, we prefer larger limits than the default
maxiter = 50 of nls() to avoid stopping early,
though this may result in some unnecessary computations.
As mentioned, the output of nls(),
minpack.lm::nlsLM(), or gslnls::gsl_nls() is
an object of class “nls” which has a quite rich structure described in
the manual files or revealed by applying the str() function
to the result of nls(). The complexity of this object is a
challenge to users. Let us use for example
result <- snlm1 as the returned object from
nlsLM() for the Logistic3S problem. The data
return element is an R symbol. To actually access the data from this
element, we need to use the syntax:
eval(parse(text = result$data))
However, if the call is made with model = TRUE, then
there is a returned element model which contains the data,
and we can list its contents using:
ls(result$model)
and if there is an element called xdata, then it can be
accessed as result$model$xdata.
By contrast, nlsr::nlxb() returns a much simpler
structure of 11 items in one level. Moreover, nlxb
explicitly returns the sum of squares, the residual vector, Jacobian,
and counts of evaluations.
Tools that produce a class nls output object create a
rich set of functions and structures that are then used in a variety of
modeling tasks, including the least squares solution. By contrast,
nlsr computes quantities as they are requested or needed,
with additional features in separate functions. For example, the
singular values of the Jacobian are actually computed in the
print and summary methods for the result.
These two approaches lead to different consequences for performance and
how features are provided. nlsr has antecedents in the
methods of John C. Nash (1979), where
storage for data and programs was at a ridiculous premium in the small
computers of the era. Thus, the code in nlsr is likely of
value for workers to copy and modify for customized tools.
Gauss-Newton/Marquardt methods all need a Jacobian matrix at each
iteration. By default, nlsr::nlxb() will try to evaluate
this using analytic expressions using symbolic and automatic
differentiation tools. When using a formula specification of the model,
nls(), minpack.lm::nlsLM() and
gslnls::gsl_nls() use a finite difference approximation to
compute the Jacobian, though gsl_nls() does have an option
to attempt symbolic expressions. Package nlsr provides, via
appropriate calling syntax, four numeric approximation options for the
Jacobian, with a further control ndstep for the size of the
step used in the approximation. These options allow programming choices
to be examined. Users can largely ignore them.
Using the “gradient” attribute of the output of the Jacobian function
to hold the Jacobian matrix lets us embed this in the residual function
as well, so that the call to nlsr::nlfb() can be made with
the same name used for both residual and Jacobian function arguments.
This programming trick saves a lot of trouble for the package developer,
but it can be a nuisance for users trying to understand the code.
As far as we can understand the logic in nls(), the
Jacobian computation during parameter estimation is carried out within
the called C-language program and its wrapper R code function
numericDeriv(), part of
./src/library/stats/R/nls.R in the R distribution source
code. This is used to provide Jacobian information in the
nlsModel() and nlsModel.plinear() functions,
which are not exported for general use. gsl_nls() also
appears to use numericDeriv().
numericDeriv() uses a simple forward difference
approximation of derivatives, though a central difference approximation
can be specified in control parameters. We are unclear why
numericDeriv() in base R calls
C_numeric_deriv, as we were easily able to create a more
compact version entirely in R. See https://github.com/nashjc/RNonlinearLS/tree/main/DerivsNLS.
minpack.lm::nlsLM() invokes numericDeriv()
in its local version of nlsModel(), but it appears to use
an internal approximate Jacobian code from the original Fortran
minpack code, namely, lmdif.f. Such
differences in approach can lead to different behavior, usually minor,
but sometimes annoying with ill-conditioned problems.
A pasture regrowth problem (Huet et al.
(2004), page 1, based on Ratkowsky
(1983)) has a poorly conditioned Jacobian and nls()
fails with “singular gradient”. Worse, numerical approximation to the
Jacobian gives the error “singular gradient matrix at initial parameter
estimates” for minpack.lm::nlsLM so that the Marquardt
stabilization is unable to take effect, while the analytic derivatives
of nlsr::nlxb give a solution.
Karl Schilling (private communication) provided an example where
a model specified with the formula y ~ a * (x ˆ b) causes
nlsr::nlxb to fail because the partial derivative w.r.t.
b is a * (x ^ b * log(x)). If there is data
for which x = 0, the derivative is undefined, but the model
can be computed. In such cases, we observed that nls() and
minpack.lm::nlsLM found a solution, though this seems to be
a lucky accident.
Analytic Jacobian code can be provided to all the solvers discussed.
Most selfStart models that automatically provide starting
parameters also include such code. There is documentation in R of
selfStart models, but their construction is non-trivial. A
number of such models are included with base R in
./src/library/stats/R/zzModels.R, with package
nlraa (Miguez (2021))
providing a richer set. There are also some in the now-archived package
NRAIA. These provide the Jacobian in the “gradient”
attribute of the “one-sided” formula that defines each model, and these
Jacobians are often the analytic forms.
The nls() function, after computing the “right-hand
side” or rhs of the residual, checks to see if the
“gradient” attribute is defined, otherwise using
numericDeriv() to compute a Jacobian into that attribute.
This code is within the nlsModel() or
nlsModel.plinear() functions. The use of analytic Jacobians
almost certainly contributes to the good performance of
nls() on selfStart models.
The use of selfStart models with nlsr is
described in the “Introduction to nlsr” vignette. However, since
nlsr generally can use very crude starting values, we have
rarely needed them, though it should be pointed out that our work is
primarily diagnostic. If we were carrying out a large number of similar
estimations, such initial parameters are critical to efficiency.
In considering selfStart models, we noted that the
base-R function SSlogis is intended to solve problem
Logistic3T above. When this function is used via
getInitial() to find starting values, it actually calls
nls() with the ‘plinear’ algorithm and finds a (full)
solution. It then passes the solution coefficients to the default
algorithm unnecessarily. Moreover, the implicit double call is, in our
view, prone to creating errors in code maintenance. To provide simpler
starting parameters, the function SSlogisJN is now part of
the package nlsr, but is most useful for
nls().
Users may also want to profit from the Jacobian code of
selfStart models but supply explicit starting values other
than those suggested by getInitial(). This does not appear
to be possible with nls(). nlsr::nlxb() always
requires starting parameters, and can either use
getInitial() to find them from the selfStart
model or provide explicit values, but the formula used in the call to
nlxb() still involves the selfStart
function.
We are also surprised that the analytic expressions for the Jacobian
(“gradient”) in the SSLogis function and others save
quantities in “hidden” variables, i.e., with names preceded by “.”.
These are then not displayed by the ls() command, making
them difficult to access by users who may wish to create their own
selfStart model via copy and edit. Interactive tools, such
as “visual fitting” (John C. Nash and Velleman
(1996)) might be worth considering as another way to find
starting parameters, but we know of no R capability of this type.
As a side note, the introduction of scaleOffset in R
4.1.1 to deal with the convergence test for small residual problems now
requires that the getInitial() function have dot-arguments
(...) in its argument list. This illustrates the
entanglement of many features in nls() that complicate its
maintenance and improvement.
For many problems, we know that parameters cannot be bigger or smaller than some externally known limits. Such limits should be built into models, but there are some important details for using the tools in R.
nls() can only impose bounds if the
algorithm = "port" argument is used in the call.
Unfortunately, the documentation warns us:
The algorithm = “port” code appears unfinished, and does not even check that the starting value is within the bounds. Use with caution, especially where bounds are supplied.
gsl_nls() does not offer bounds.
bounds are part of the default method for package
nlsr.
nlsLM() includes bounds in the standard call, but we
have observed cases where it fails to get the correct answer. From an
examination of the code, we believe the authors have not taken into
account all possibilities, though all programs have some weakness
regarding constrained optimization in that programmers have to work with
assumptions on the scale of numbers, and some problems will be outside
the scope envisaged.
# Start MUST be feasible i.e. on or within bounds
anlshob1b <- try(nls(frms, start = sts1, data = weeddf, lower = c(0,0,0),
upper = c(2,6,3), algorithm = 'port'))
if (! inherits(anlshob1b, "try-error")) pnls(anlshob1b) # check the answer (short form)
## anlshob1b -- ss= 9.4726 : c1 = 2 c2 = 4.4332 c3 = 3; 12 itns
# nlsLM seems NOT to work with bounds in this example
anlsLM1b <- try(nlsLM(frms, start = sts1, data = weeddf, lower = c(0,0,0), upper = c(2,6,3)))
if (! inherits(anlsLM1b, "try-error")) pnls(anlsLM1b)
## anlsLM1b -- ss= 881.02 : c1 = 2 c2 = 6 c3 = 3; 2 itns
# also no warning if starting out of bounds, but gets a good answer!!
st4 <- c(c1 = 4, c2 = 4, c3 = 4)
anlsLMob <- try(nlsLM(frms, start = st4, data = weeddf, lower = c(0,0,0), upper = c(2,6,3)))
if (! inherits(anlsLMob, "try-error")) pnls(anlsLMob)
## anlsLMob -- ss= 9.4726 : c1 = 2 c2 = 4.4332 c3 = 3; 4 itns
# Try nlsr::nlxb()
anlx1b <- try(nlxb(frms, start = sts1, data = weeddf, lower = c(0,0,0), upper = c(2,6,3)))
if (! inherits(anlx1b, "try-error")) pshort(anlx1b)
## anlx1b -- ss= 9.4726 : c1 = 2 c2 = 4.4332 c3 = 3; 12 res/ 12 jac
Bounds on parameters raise some interesting questions about how
uncertainty in parameter estimates should be computed or reported. That
is, the traditional “standard errors” are generally taken to imply
symmetric intervals about the point estimate in which the parameter may
be expected to be found with some probability under certain assumptions.
Bounds change those assumptions and hence the interpretation of
estimates of uncertainty, whether by traditional approximations from the
\(J' J\) matrix or from methods
such as profile likelihood or bootstrap. At the time of writing,
nlsr::nlxb() does not compute standard errors nor their
derived statistics when bounds are active to avoid providing misleading
information.
Let us try to fix (mask) the first parameter in the first two example problems.
# Hobbsmaskx.R -- masks with formula specification of the problem
require(nlsr); require(minpack.lm); traceval <- FALSE
stu <- c(b1 = 200, b2 = 50, b3 = 0.3) # a default starting vector (named!)
sts <- c(c1 = 2, c2 = 5, c3 = 3) # a default scaled starting vector (named!)
# fix first parameter
anxbmsk1 <- try(nlxb(frmu, start = stu, data = weeddf, lower = c(200, 0, 0),
upper = c(200, 60, 3), trace=traceval))
if (! inherits(anxbmsk1, "try-error")) print(anxbmsk1)
## residual sumsquares = 2.6182 on 12 observations
## after 4 Jacobian and 4 function evaluations
## name coeff SE tstat pval gradient JSingval
## b1 200U M NA NA NA 0 NA
## b2 49.5108 1.12 44.21 8.421e-13 -2.887e-07 1022
## b3 0.311461 0.002278 136.8 1.073e-17 0.0001635 0.4569
anlM1 <- try(nlsLM(frmu, start = stu, data = weeddf, lower = c(200, 0, 0),
upper=c(200, 60, 3), trace = traceval))
if (! inherits(anlM1, "try-error")) pnls(anlM1)
## anlM1 -- ss= 2.6182 : b1 = 200 b2 = 49.511 b3 = 0.31146; 4 itns
anlsmsk1 <- try(nls(frmu, start = stu, data = weeddf, lower = c(200, 0, 0),
upper = c(200, 60, 3), algorithm = "port", trace = traceval))
if (! inherits(anlsmsk1, "try-error")) pnls(anlsmsk1)
## anlsmsk1 -- ss= 2.6182 : b1 = 200 b2 = 49.511 b3 = 0.31146; 5 itns
# Hobbs scaled problem with bounds, formula specification
anlxmsks1 <- try(nlxb(frms, start = sts, data = weeddf, lower = c(2, 0, 0),
upper = c(2, 6, 30)))
if (! inherits(anlxmsks1, "try-error")) print(anlxmsks1)
## residual sumsquares = 2.6182 on 12 observations
## after 4 Jacobian and 4 function evaluations
## name coeff SE tstat pval gradient JSingval
## c1 2U M NA NA NA 0 NA
## c2 4.95108 0.112 44.21 8.421e-13 -2.981e-06 104.2
## c3 3.11461 0.02278 136.8 1.073e-17 1.583e-05 4.482
anlshmsk1 <- try(nls(frms, start = sts, trace = traceval, data = weeddf,
lower = c(2, 0, 0), upper = c(2, 6, 30), algorithm = 'port'))
if (! inherits(anlshmsk1, "try-error")) pnls(anlshmsk1)
## anlshmsk1 -- ss= 2.6182 : c1 = 2 c2 = 4.9511 c3 = 3.1146; 5 itns
anlsLMmsks1 <- try(nlsLM(frms, start = sts, data = weeddf, lower = c(2,0,0),
upper = c(2,6,30)))
if (! inherits(anlsLMmsks1, "try-error")) pnls(anlsLMmsks1)
## anlsLMmsks1 -- ss= 2.6182 : c1 = 2 c2 = 4.9511 c3 = 3.1146; 4 itns
# Test with all parameters masked
anlxmskall <- try(nlxb(frms, start=sts, data=weeddf, lower=sts, upper=sts))
if (! inherits(anlxmskall, "try-error")) print(anlxmskall)
## residual sumsquares = 158.23 on 12 observations
## after 0 Jacobian and 1 function evaluations
## name coeff SE tstat pval gradient JSingval
## c1 2U M NA NA NA NA NA
## c2 5U M NA NA NA NA NA
## c3 3U M NA NA NA NA NA
nlsr has an output format that indicates the constraint
status of the parameter estimates. For nlsr, we have chosen
to suppress the calculation of approximate standard errors in the
parameters when constraints are active because their meaning under
constraints is unclear, though we believe this policy worthy of
discussion and further investigation.
All four major tools illustrated solve some variant of the
Gauss-Newton equations. nls() uses a modification of an
approach suggested by Hartley (1961),
while nlsr, gslnls, and
minpack.lm use flavors of Marquardt
(1963). gslnls offers an accelerated Marquardt
method and three alternative methods which we have not investigated.
Control settings for nlxb() or nlfb() allow
exploration of Hartley and Marquardt algorithm variants. In general, the
Levenberg-Marquardt stabilization is important in obtaining solutions in
methods of the Gauss-Newton family, as nls() terminates too
frequently and unnecessarily with singular gradient
errors.
An important choice made in developing nlsr was to code
entirely within the R programming language. nls() uses a
mix of R, C, and Fortran, as does minpack.lm.
gslnls is an R wrapper to various C-language routines in
the GNU Scientific Library (Galassi et al.
(2009)). Generally, keeping to a single programming language can
allow for easier maintenance and upgrades. It also avoids some work when
there are changes or upgrades to libraries for the non-R languages. R is
usually considered slower than most computing environments because it
keeps track of objects and because it is usually interpreted. In recent
years, the performance penalty for using code entirely in R has been
much reduced with the just-in-time compiler and other improvements.
All-R computation may now offer acceptable performance. In
nlsr, the use of R may be a smaller performance cost than
the aggressive approach to a solution, which can cause more iterations
to be used.
nls() can be called without specifying the
data argument. In this case, it will search in the
available environments (i.e., workspaces) for suitable data objects. We
do not like this approach, but it is “the R way”. R allows users to
leave many objects in the default (.GlobalEnv) workspace.
Moreover, users have to actively suppress saving this workspace
(.RData) on exit, otherwise any such file in the path will
be loaded on startup. R users in our acquaintance avoid saving the
workspace because of lurking data and functions that may cause unwanted
results.
nls() and other class nls tools accept an
argument subset. This acts through the mediation of
model.frame, which is not obvious in the source code files
/src/library/stats/R/nls.R and
/src/library/stats/src/nls.C. Having subset at
the level of the call to a function like nls() saves
effort, but it does mean that the programmer of the solver needs to be
aware of the origin (and value) of objects such as the data, residuals
and Jacobian. By preference, we would implement subsetting by zero-value
weights, with observation counts (and degrees of freedom) computed via
the numbers of non-zero weights. Alternatively, we would extract a
working dataframe from the relevant elements in the original.
na.action is an argument to the nls()
function, but it does not appear obviously in the source code, often
being handled behind the scenes after referencing the option
na.action. This feature also changes the data supplied to
our nonlinear least squares solver. A useful, but possibly dated,
description is given in: https://stats.idre.ucla.edu/r/faq/how-does-r-handle-missing-values/.
The typical default action, which can be seen by using the command
getOption("na.action") is na.omit. This option
removes from computations any observations containing missing values
(i.e. any row of a data frame containing an NA). na.exclude
does much of the same for solver computations, but keeps the rows with
NA elements so that predictions are in the correct row position. We
recommend that workers test output to verify the behavior is as wanted.
See https://stats.stackexchange.com/questions/492955/should-i-use-na-omit-or-na-exclude-in-a-linear-model-in-r.
As with subset, our concern with na.action is
that users may be unaware of the effects of an option they may not even
be aware has been set. Should na.fail be the default?
model is an argument to the nls() and
related functions, which is documented as:
model logical. If true, the model frame is returned as part of the object. Default is FALSE.
Indeed, the argument only gets used when nls() is about
to return its result object, and the element model is NULL
unless the calling argument model is TRUE. Using the same
name for both function argument and object element could be confusing.
Despite this, the model
frame is used within the function code in the form of the object
mf. We feel that users could benefit from more extensive
documentation and examples of its use since it is used to implement
features like subset.
All four main tools we consider here allow a weights
argument that specifies a vector of fixed weights the same length as the
number of residuals. Each residual is multiplied by the square root of
the corresponding weight. Where available, the values returned by the
residuals() function are weighted, and the
fitted() or predict() function are used to
compute raw residuals.
While fixed weights may be useful, there are many problems for which
we want weights that are determined at least partially from the model
parameters, for example, a measure of the standard deviation of
observations. Such dynamic weighting situations are discussed in the
vignette “Introduction to nlsr” of package nlsr in section
Weights that are functions of the model parameters.
minpack.lm offers a function wfct() to
facilitate such weighting. Care is advised in applying such ideas.
The function resid() (an alias for
residuals()) gives weighted residuals like
result$m$resid(). The function nlsModel()
allows us to compute residuals
for particular coefficient sets. We have had to extract
nlsModel() from the base R code and include it via a code
chunk (not echoed here for space) because it is not exported to the
working namespace, We could also explicitly source() this
code.
wts <- 0.5 ^ tt # simple weights
frmlogis <- weed ~ Asym / (1 + exp((xmid - tt)/scal))
Asym <- 1; xmid <- 1; scal <- 1
nowt <- try(nls(weed ~ SSlogis(tt, Asym, xmid, scal))) # UNWEIGHTED
if (! inherits(nowt, "try-error")) {
rnowt <- nowt$m$resid() # This has UNWEIGHTED residual and Jacobian. Does NOT take coefficients.
attr(rnowt, "gradient") <- NULL
} else rnowt <- NULL
rnowt
## [1] -0.011900 0.032755 -0.092030 -0.208782 -0.392634 0.057594 1.105728
## [8] -0.715786 0.107647 0.348396 -0.652592 0.287569
usewt <- try(nls(weed ~ SSlogis(tt, Asym, xmid, scal), weights = wts))
if (! inherits(usewt, "try-error")) {
rusewt <- usewt$m$resid() # WEIGHTED. Does NOT take coefficients.
attr(rusewt, "gradient") <- NULL
} else rusewt <- NULL
rusewt
## [1] 0.0085640 0.0324442 -0.0176652 -0.0388479 -0.0579575 0.0163623
## [7] 0.1042380 -0.0411766 0.0052509 0.0084324 -0.0194246 -0.0024053
## source("nlsModel.R") # or use {r nlsmodelsource, echo=FALSE} code chunk
nmod0 <- nlsModel(frmlogis, data = weeddf, start = c(Asym = 1, xmid = 1, scal = 1), wts = wts)
rn0 <- nmod0$resid() # Parameters are supplied in nlsModel() `start` above.
attr(rn0, "gradient") <- NULL; rn0 # weighted residuals at starting coefficients
## [1] 3.3998 3.2545 3.0961 2.9784 2.8438 2.7748 2.6910 2.3474 2.1724 1.9359
## [11] 1.6572 1.4214
nmod <- nlsModel(frmlogis, data = weeddf, start = coef(usewt), wts = wts)
rn <- nmod$resid()
attr(rn,"gradient")<-NULL; rn # same as rusewt
## [1] 0.0085640 0.0324442 -0.0176652 -0.0388479 -0.0579575 0.0163623
## [7] 0.1042380 -0.0411766 0.0052509 0.0084324 -0.0194246 -0.0024053
As the nls() man page states, when the
“port” algorithm is used with the trace argument TRUE, the
iterations display the objective function value which is 1/2 the sum of
squares (or deviance). The trace display is likely embedded in the
Fortran of the nlminb routine that is called to execute the
“port” algorithm, but the factor of 2 discrepancy is nonetheless
unfortunate for users.
If nls() reaches a point where it cannot continue but
has not found a point where the relative offset convergence criterion is
met, it may simply exit, especially if a “singular gradient” (singular
Jacobian) is found. However, this may occur AFTER the function has made
considerable progress in reducing the sum of squared residuals.
Here is an
abbreviated example:
time <- c( 1, 2, 3, 4, 6 , 8, 10, 12, 16)
conc <- c( 0.7, 1.2, 1.4, 1.4, 1.1, 0.8, 0.6, 0.5, 0.3)
NLSdata <- data.frame(time,conc)
NLSstart <- c(lrc1 = -2, lrc2 = 0.25, A1 = 150, A2 = 50) # a starting vector (named!)
NLSformula <- conc ~ A1 * exp(-exp(lrc1) * time) + A2 * exp(-exp(lrc2) * time)
tryit <- try(nls(NLSformula, data = NLSdata, start = NLSstart, trace = TRUE))
## 61216. (3.56e+03): par = (-2 0.25 150 50)
## 2.1757 (2.23e+01): par = (-1.9991 0.31711 2.6182 -1.3668)
## 1.6211 (7.14e+00): par = (-1.9605 -2.6203 2.5753 -0.55599)
## Error in nls(NLSformula, data = NLSdata, start = NLSstart, trace = TRUE) :
## singular gradient
if (! inherits(tryit, "try-error")) print(tryit)
Note that the sum of squares has been reduced from 61216 to 1.6211,
but unless trace is invoked, the user will not get any
information about this. Changing this in the nls() function
could be useful to R users, and would be almost trivial.
The variable projection method (Golub and
Pereyra (1973), O’Leary and Rust
(2013)) is usually much more effective than general approaches in
finding good solutions to nonlinear least squares problems when some of
the parameters appear linearly. In our logistic examples, the asymptote
parameters are an illustration. However, identifying which parameters
are linear and communicating this information to estimating functions is
not a trivial task. nls() has an option
algorithm = "plinear" that allows some partially linear
models to be solved. The other tools, as far as we are aware, do not
offer any such capability. The nlstac package uses a
different algorithm for similar goals.
Within nls() itself we must, unfortunately, use
different specifications with different algorithm options.
For example, the explicit model y ~ a * x + b does not work
with the linear modeling function lm(), which requires this
model to be specified as y ~ x. Within nls(),
consider the following FOUR different specifications for the same
problem, plus an intuitive choice, labeled fm2a, that does
not work. In this failed attempt, putting the Asym
parameter in the model causes the plinear algorithm to try
to add another term to the model. We believe this is unfortunate, and
would like to see a consistent syntax. At the time of writing, we do not
foresee a resolution for this issue. In the example, we have not
evaluated the commands to save space.
DNase1 <- subset(DNase, Run == 1) # select the data
## using a selfStart model - do not specify the starting parameters
fm1 <- try(nls(density ~ SSlogis(log(conc), Asym, xmid, scal), DNase1))
if (! inherits(fm1, "try-error")) summary(fm1)
## using conditional linearity - leave out the Asym parameter
fm2 <- try(nls(density ~ 1 / (1 + exp((xmid - log(conc)) / scal)),
data = DNase1, start = list(xmid = 0, scal = 1),
algorithm = "plinear"))
if (! inherits(fm2, "try-error")) summary(fm2)
## without conditional linearity
fm3 <- try(nls(density ~ Asym / (1 + exp((xmid - log(conc)) / scal)),
data = DNase1,
start = list(Asym = 3, xmid = 0, scal = 1)))
if (! inherits(fm3, "try-error")) summary(fm3)
## using Port's nl2sol algorithm
fm4 <- try(nls(density ~ Asym / (1 + exp((xmid - log(conc)) / scal)),
data = DNase1, start = list(Asym = 3, xmid = 0, scal = 1),
algorithm = "port"))
if (! inherits(fm4, "try-error")) summary(fm4)
## using conditional linearity AND Asym does not work
fm2a <- try(nls(density ~ Asym / (1 + exp((xmid - log(conc)) / scal)),
data = DNase1, start = list(Asym=3, xmid = 0, scal = 1),
algorithm = "plinear", trace = TRUE))
if (! inherits(fm2a, "try-error")) summary(fm2a)
Some models have several common parameters and others that are tied
to particular cases. The man file for
nls() includes an example of a situation in which
parameters are indexed but which uses the “plinear” option as an added
complication. Running this example reveals that the answers for the
parameters are not indexed as in a vector.
That is, we do not see a[1], a[2], a[3] but
a1, a2, a3.
This is no doubt because programming for indexed parameters is
challenging.
We note that there are
capabilities in packages for mixed-effects modeling such as
nlme (Pinheiro et al.
(2013)), bbmle (Bolker and
Team (2013)), and lme4 (D.
Bates et al. (2015)) for estimating models of such type.
Maintainers of packages need suitable tests and use-case examples in order
Such goals align with the aims of unit testing (e.g., https://towardsdatascience.com/unit-testing-in-r-68ab9cc8d211, Wickham (2011), Wickham et al. (2021) and the conventional R package testing tools). In our work, one of us (AB) has developed a working prototype package at https://github.com/ArkaB-DS/nlsCompare . A primary design objective of this is to allow the summarization of multiple tests in a compact output. The prototype has a vignette to illustrate its use.
In our investigation, we built several resources, which are now part of the repository https://github.com/nashjc/RNonlinearLS/. Particular items are:
A BibTex bibliography for use with all documents in this project, but which has wider application to nonlinear least squares projects in general (https://github.com/nashjc/RNonlinearLS/blob/main/BibSupport/ImproveNLS.bib).
MachID.R offers a suggested concise summary function
to identify a particular computational system used for tests. A
discussion of how it was built and the resources needed across platforms
is given in at https://github.com/nashjc/RNonlinearLS/tree/main/MachineSummary.
John C. Nash and Bhattacharjee
(2022) is an explanation of the construction of the
nlspkg from the nls() code in R-base.
As the 2021 Summer of Code period was ending, one of us (JN) was invited to prepare a review of optimization in R. Ideas from the present work have been instrumental in the creation of John C. Nash (2022).
Given its importance to R, it is possible that nls()
will remain more or less as it has been for the past several decades. If
so, the focus of discussion should be the measures needed to secure its
continued operation for legacy purposes and how that may be
accomplished. We welcome an opportunity to participate in such
conversations.
To advance the stability and maintainability of R, we believe the
program objects (R functions) that are created by tools such as
nls() should have minimal cross-linkages and side-effects.
The aspects of nls() that concern us follow.
R tools presume data and parameters needed are available in an
accessible environment. This provides a compact syntax to invoke the
calculations, but the wrong data can be used if the internal search
finds a valid name that is not the object we want, or if
subset or na.action settings modify the
selection, or weights are applied.
Mixing of R, C and Fortran code adds to the burden of following the program logic.
The class nls structure simplifies calls with its
rich set of functions, but also adds to the task of understanding what
has been done. A design that isolates the setup, solution, and
post-solution parts of complicated calculations reduces the number of
objects that must be kept in alignment.
Given the existence of examples of good practices such as analytic derivatives, stabilized solution of Gauss-Newton equations, and bounds-constrained parameters, base R tools should be moving to incorporate them. These capabilities are available now by using several tools, but it would be helpful if they were unified.
Hans Werner Borchers was helpful in developing the GSoC project
motivating this article and in comments on this and related work.
Heather Turner co-mentored the project and helped guide the progress of
the work. Exchanges with Fernando Miguez helped to clarify aspects of
selfStart models and instigated the “Introduction to nlsr”
vignette. Colin Gillespie (package benchmarkme) has been
helpful in guiding our attempts to succinctly summarize computing
environments. We thank one of the referees for pointing out that it is
important to use background knowledge and graphs to guide modeling.
Simon Urbanek gave some important practical help with preparing files
for publication.