Abstract
This paper is dedicated to the R package FMM which implements a novel approach to describe rhythmic patterns in oscillatory signals. The frequency modulated (FMM) model is defined as a parametric signal plus a Gaussian noise, where the signal can be described as a single or a sum of waves. The FMM approach is flexible enough to describe a great variety of rhythmic patterns. The FMM package includes all required functions to fit and explore single and multi-wave FMM models, as well as a restricted version that allows equality constraints between parameters representing a priori knowledge about the shape to be included. Moreover, the FMM package can generate synthetic data and visualize the results of the fitting process. The potential of this methodology is illustrated with examples of such biological oscillations as the circadian rhythm in gene expression, the electrical activity of the heartbeat and the neuronal activity.Oscillations naturally occur in a multitude of physical, chemical, biological, and even economic and social processes. Periodic signals appear, for example, during the cell-cycle, in biological time-keeping processes, in human heartbeats, in neuronal signals, in light emissions from certain types of stars, or in business cycles in economics, among many others. Three features typically describe the periodic nature of the oscillatory motion: period, amplitude and phase. The period is the time required for one complete oscillation. Within a period, a sum of monocomponent models, characterized by the phase and amplitude parameters, can be used to describe the rhythmic pattern of a signal (Boashash 2016). By varying the number of monocomponents and considering phase and amplitude parameters as fixed or variable, a large number of rhythmic signal representations can be found.
One of the most popular representations of oscillating signals is the
Fourier decomposition (FD): a multicomponent representation with a fixed
amplitude parameter. Its monocomponent version, the cosinor model (COS)
(Cornelissen 2014), is widely used, in
particular in chronobiology, with acceptable results when a sinusoidal
shape response within a period is expected. Due to its widespread use,
many software utilities are available. Particularly in R, the estimation
of a COS model can be performed using cosinor
(Sachs 2014) and cosinor2
packages (Mutak 2018). In addition, other
packages from widely differing areas of knowledge have specific
functions for fitting COS models. Such is the case of, for example, the
function CATCosinor in the CATkit
package (Gierke, Helget, and
Cornelissen-Guillaume 2018), which implements tools for
periodicity analysis; the function cosinor in the psych
package (Revelle 2021), dedicated to
personality and psychological research; or the function
cosinor contained in a recent package, card (Shah 2020), which is dedicated to the
assessment of the regulation of cardiovascular physiology. Recently, it
has also been implemented in other languages such as CosinorPy, a
cosinor python package (Moskon 2020). The
COS model is easy to use and interpret with symmetrical patterns.
However, asymmetric shapes are not captured properly by COS. When the
waveform is nonsinusoidal, the use of multiple components analysis to
fit a model consisting of a sum of several periodical functions is
recommended. However, the multicomponent FD models, developed to provide
flexibility from COS, often require the use of a large number of
components resulting in serious overfitting issues.
In recent years, alternative methods, mostly nonparametric statistical methods, have been developed and used for analyzing rhythmicity, especially in biological data sets. Some very popular ones, such as the JTK_CYCLE (Hughes, Hogenesch, and Kornacker 2010), wrongly assume that any underlying rhythms have symmetric waveforms. Others, such as RAIN (Thaben and Westermark 2014), designed to detect more diverse wave shapes including asymmetric patterns, are not focused on modeling but on detecting rhythmic behavior in sets of data. Thus, they are not useful to describe the underlying oscillatory phenomena. The proliferation of methodology in this field has been accompanied by software developments. This is the case, for example, of the DiscoRhythm R package (Carlucci et al. 2020), very recently available on Bioconductor with a web interface based on the R Shiny platform (Chang et al. 2021). This tool allows four popular approaches to be used, including the COS model and JTK_CYCLE, to discover biological rhythmicity. Another recent example is the circacompare (Parsons et al. 2020), an R package implemented for modeling cosinusoidal curves by nonlinear regression. Hosted on GitHub, we can also find the LimoRhyde R package (https://github.com/hugheylab/limorhyde) for the differential analysis of rhythmic transcriptome data, based on fitting linear models (Singer and Hughey 2019).
Motivated by the need for a flexible, interpretable and parametric methodology to fit rhythmic patterns, our research group recently proposed the frequency modulated (FMM) model (Rueda, Larriba, and Peddada 2019). The FMM is an additive nonlinear parametric regression model capable of adapting to nonsinusoidal shapes and whose parameters are easily interpretable. The single component model has been shown to successfully fit data as diverse as circadian clock signals, hormonal levels data or light data from distant stars. In addition, for more complex oscillatory signals, a multicomponent model of order \(m\), denoted as FMM\(_m\), which includes \(m\) single FMM components, can be used. This is, for example, the case for describing electrocardiography (ECG) signals. The FMM\(_{ecg}\) signal, presented in Rueda, Larriba, and Lamela (2021), is defined as the combination of five single FMM components. Another interesting area where the FMM approach has already shown its usefulness is in electrophysiological neuroscience. Specifically, we have proposed FMM methodology for modeling neuronal action potential (AP) curves, oscillating signals that measure the difference between the electrical potential inside and outside the cell (see Rueda, Rodríguez-Collado, and Larriba 2021; Rodrı́guez-Collado and Rueda 2021a). An FMM\(_2\) model provides an accurate fitting for a single AP curve; whereas series of AP curves with similar repetitive spikes can be efficiently fitted by the FMM\(_{ST}\) model, a restricted version of the multicomponent FMM model.
In this work we introduce the FMM package (Fernández et al. 2021), programmed in R and available from the Comprehensive R Archive Network (CRAN) at http://CRAN.R-project.org/package=FMM. The package implements all required functions to fit and explore single and multicomponent FMM models, as well as a restricted multicomponent version. In addition, the FMM package provides functions to generate synthetic data and visualize the results of the fitted model. Furthermore, its use is illustrated in the aforementioned applications. The remainder of this paper is organized as follows: the next section provides a brief overview of both mono and multicomponent FMM models, as well as the FMM\(_m\) model with equality constraints. The section follows is dedicated to the implementation details of the FMM package. After that, through a simulated example, the basic usage of the package is introduced, including the data generation and the fitting, as well as the visualization of the results. Then, the FMM package performance is shown through three application areas governed by oscillatory systems: chronobiology, ECG and neuroscience. Finally, a summary is provided.
FMM is a new approach to describe a great variety of rhythmic patterns in oscillatory signals as the composition of several additive components. In this section an overview of the FMM approach is provided. All the methodological details that justify the mathematical formulation of the FMM models are given in (Rueda, Larriba, and Peddada 2019).
At the time point \(t\), a single FMM wave is defined as \(W\left(t; \upsilon\right) = A \cos\left(\phi\left(t; \alpha, \beta, \omega\right)\right)\) where \(\upsilon = \left(A, \alpha, \beta, \omega\right)^{\prime}\), \(A \in \Re^{+}\) represents the wave amplitude and, \[\label{eq:phase} \phi\left(t; \alpha, \beta, \omega\right)= \beta + 2\arctan\left(\omega \tan\left(\frac{t - \alpha}{2}\right)\right) (\#eq:phase)\] the wave phase. The phase angle \(\phi\) of an FMM wave is defined using the link (see Downs and Mardia 2002; Kato, Shimizu, and Shieh 2008) rather than the linear link function as in the COS model. The link provides much more flexibility to describe nonsinusoidal patterns. Without loss of generality, we assume that the time point \(t \in \left[0, 2\pi\right]\). Otherwise it can be transformed into \(t^{\prime} \in \left[t_0, T + t_0\right]\) by \(t = \frac{\left(t^{\prime} - t_0\right)2\pi}{T}\).
Each of the four parameters of an FMM wave characterizes some aspect of a rhythmic pattern. \(A\) describes the amplitude of the signal, while \(\alpha\), \(\beta\) and \(\omega\) describe the wave phase. \(\alpha \in \left[0, 2\pi\right]\) is a translation parameter and a wave location parameter in the real space, whereas \(\beta \in \left[0, 2\pi\right]\) and \(\omega \in \left[0, 1\right]\) describe the wave shape. To be precise, assuming \(\alpha = 0\), the unimodal symmetric waves are characterized by values of \(\beta\) close to \(0\), \(\pi\) or \(2\pi\). When \(\beta = \frac{\pi}{2}\) or \(\beta = \frac{3\pi}{2}\), extreme asymmetric patterns are described. Moreover, a value of \(\omega\) close to zero describes an extreme spiked wave and, as \(\omega\) value increases, the pattern is increasingly smoother. When \(\omega = 1\), a sinusoidal wave is described and the FMM model matches the COS model where \(\varphi = \beta - \alpha\) is the acrophase parameter.
Two important features of a wave are the peak and trough, defined as the highest and lowest points above and below the rest position, respectively. In many applications, the peak and trough times could be very useful tools to extract practical information of a wave, since they capture important aspects of the dynamics. These two interesting parameters can be directly derived from the main parameters of an FMM wave as, \[\begin{aligned} \label{eq:monoFMM:fiducial} t^U & = \alpha + 2\arctan\left(\frac{1}{\omega}\tan\left(-\frac{\beta}{2}\right)\right) \\ t^L & = \alpha + 2\arctan\left(\frac{1}{\omega}\tan\left(\frac{\pi-\beta}{2}\right)\right) \nonumber \end{aligned} (\#eq:monoFMMfiducial)\] where \(t^U\) and \(t^L\) denote the peak and trough times, respectively.
Let \(X\left(t_i\right)\), \(t_1 < t_2 < \dots < t_n\) be the vector of observations. The monocomponent FMM model is defined as follows: \[\label{eq:monoFMM} X\left(t_i\right) = M + W\left(t_i; \upsilon\right) + e\left(t_i\right), \quad i = 1,\dots,n (\#eq:monoFMM)\] where \(M \in \Re\) is an intercept parameter describing the baseline level of the signal, \(W\left(t_i; \upsilon\right)\) is an FMM wave, and it is assumed that the errors \(e\left(t_i\right)\) are independent and normally distributed with zero mean and a common variance \(\sigma^2\).
A two-step algorithm to estimate monocomponent FMM model parameters is proposed. We now describe the substantial details of each stage of the algorithm.
Step 1: Initial parameter estimation. A two-way grid search over the choice of \(\left(\alpha , \omega\right)\) parameters is performed. For each pair of \(\left(\alpha , \omega\right)\) fixed values, the estimates for \(M\), \(A\) and \(\beta\) are obtained by solving a least square problem as detailed below.
The model for a single FMM component can be written as: \[\label{eq:monoFMM:muiParam} X\left(t_i\right) = M + A\cos\left(t_{i}^{*} + \varphi\right)+ e\left(t_i\right) (\#eq:monoFMMmuiParam)\] where \(t_{i}^{*} = \alpha + 2\arctan\left(\omega \tan\left(\frac{t_i - \alpha}{2}\right)\right)\), \(\varphi = \beta - \alpha\), and \(e\left(t_i\right) \sim N\left(0, \sigma^2\right)\) for \(i = 1, \dots, n\).
Using trigonometric angle sum identity, the model can be rewritten as: \[\label{eq:monoFMM:reWrite} X\left(t_i\right) = M + \delta z_i + \gamma w_i + e\left(t_i\right) (\#eq:monoFMMreWrite)\] where \(\delta = A \cos\left(\varphi \right)\), \(\gamma = -A \sin\left(\varphi \right)\), \(z_i = \cos\left(t_i^*\right)\) and \(w_i = \sin\left(t_i^*\right)\).
Since \(\alpha\) and \(\omega\) are fixed, the estimates for \(M\), \(\delta\) and \(\gamma\) are obtained by minimizing the residual sum of the squares (RSS), \[\label{eq:monoFMM:SSR} RSS = \sum_{i=1}^{n} \left(X\left(t_i\right)-\left(\hat{M} + \hat{\delta} z_i + \hat{\gamma} w_i\right)\right)^2 (\#eq:monoFMMSSR)\]
And the estimates for \(M\), \(A\) and \(\beta\) are straightforward to derive as follows, \[\begin{aligned} \label{eq:monoFMM:estimates} \hat{M} & = \bar{X} - \hat{\delta} \sum_{i=1}^{n} z_i - \hat{\gamma} \sum_{i=1}^{n} w_i\\ \hat{A} & = \sqrt{\hat{\delta}^2 + \hat{\gamma}^2} \\ \hat{\beta} & = \alpha + \varphi \end{aligned} (\#eq:monoFMMestimates)\]
The best combination of \(\left(\alpha, \omega \right)\) values, with the lowest RSS, is retained and the corresponding estimates are the initial parameter estimation values.
Step 2: Optimization. In the second step, the Nelder-Mead optimization method (Nelder and Mead 1965) is used to obtain the final FMM parameter estimates that minimize the RSS.
A multicomponent FMM model of order \(m\), denoted by FMM\(_m\), is defined as \[\begin{aligned} \label{eq:multiFMM} X\left(t_i \right) = M + \sum_{J=1}^{m}W\left(t_i; \upsilon_J \right) + e\left(t_i\right) \\ t_1 < t_2 < \dots < t_n; i = 1, \dots, n \nonumber \end{aligned} (\#eq:multiFMM)\] where \(W\left(t_i; \upsilon_J\right)\), hereinafter denoted by \(W_J\left(t_i\right)\), is the Jth FMM wave and,
The goodness of fit of an FMM model is measured with the \(R^2\) statistic that represents the proportion of the variance explained by the model out of the total variance, that is: \[\label{eq:multiFMM:R2} R^2 = 1 - \frac{\sum_{i=1}^{n} \left(X\left(t_i\right) - \hat{X}\left(t_i\right)\right)^2}{\sum_{i=1}^{n} \left(X\left(t_i\right) - \bar{X}\right)^2} (\#eq:multiFMMR2)\] where \(\hat{X}\left(t_i\right)\) represents the fitted value at \(t_i, i=1, \dots ,n\).
An iterative backfitting algorithm is proposed to derive estimates for the FMM parameters. Let \(\hat{W}_J^{\left(k\right)}\left(t_i\right)\) denote the fitted values from the Jth FMM wave at \(t_i, i=1,...,n\) in the kth iteration. The algorithm is structured as follows:
Initialize. Set \(\hat{W}_1^{\left(0\right)}\left(t_i\right) = \dots = \hat{W}_m^{\left(0\right)}\left(t_i\right) = 0\).
Backfitting step. For \(J = 1, \dots ,m\), calculate \[\label{eq:multiFMM:partialres} r_J^{\left(k\right)}\left(t_i\right) = X\left(t_i\right) - \sum_{I < J} \hat{W}_I^{\left(k\right)}\left(t_i\right) - \sum_{I > J} \hat{W}_I^{\left(k-1\right)}\left(t_i\right); I = 1, \dots, m (\#eq:multiFMMpartialres)\] and fit a monocomponent FMM model to \(r_J^{\left(k\right)}\left(t_i\right)\) obtaining \(\hat{\alpha}_J^{\left(k\right)}\), \(\hat{\beta}_J^{\left(k\right)}\), \(\hat{\omega}_J^{\left(k\right)}\) and \(\hat{W}_J^{\left(k\right)}\left(t_i\right)\).
Repeat the backfitting step until the stopping criterion is reached. The stopping criterion is defined as the difference between the explained variability in two consecutive iterations: \(R_k^2 - R_{k-1}^2 \leq C\), where \(R_k^2\) (defined in Equation @ref(eq:multiFMMR2)) is the proportion of variance explained by the model in the kth iteration and \(C\) a constant.
\(\hat{M}\) and \(\hat{A_J}\) are derived by solving \[\label{eq:multiFMM:backfitting} \min_{M \in \Re; A_J \in \Re^+} \sum_{i=1}^n \left(X\left(t_i\right) - M - \sum_{J=1}^{m} A_{J}\cos\left(\hat{\phi}_J\left(t_i\right)\right)\right)^2 (\#eq:multiFMMbackfitting)\] where \(\hat{\phi}_J\left(t_i\right) = \phi\left(t_i; \hat{\alpha}_J, \hat{\beta}_J, \hat{\omega}_J\right)\) defined in Equation @ref(eq:phase).
Modeling signals with repetitive shape-similar waves can be very useful in some applications (see Rodrı́guez-Collado and Rueda 2021a). In order to obtain more efficient estimators, equality constraints are imposed on the \(\beta\) and \(\omega\) parameters of an FMM\(_m\) model. In particular, we add \(d\) blocks of restrictions: \[\begin{aligned} \label{eq:multiFMM:betaRestr} \beta_1 & = \dots = \beta_{m_1} & \omega_1 = \dots = \omega_{m_1}\\ \beta_{m_1+1} & = \dots = \beta_{m_2} & \omega_{m_1+1} = \dots = \omega_{m_2} \nonumber \\ &\dots \nonumber \\ \beta_{m_{d-1}+1} & = \dots = \beta_{m_d} & \omega_{m_{d-1}+1} = \dots = \omega_{m_d}\nonumber \end{aligned} (\#eq:multiFMMbetaRestr)\]
The parameter estimation problem is solved by an adaptation of the standard procedure.
Given the unrestricted estimates obtained in step \(3\), the estimates for \(\beta_1, \beta_{m_1+1}, \dots, \beta_{m_{d-1}+1}\) under equality restrictions (Equation @ref(eq:multiFMMbetaRestr)) are computed as follows:
\[\begin{aligned} &\hat{\beta}_J^* = \mathrm{angularMean}\left(\hat{\beta}_1, \dots ,\hat{\beta}_{m_1}\right)& J & = 1, \dots ,m_1 \\ &\hat{\beta}_J^* = \mathrm{angularMean}\left(\hat{\beta}_{m_1+1}, \dots ,\hat{\beta}_{m_2}\right)& J & = m_1 + 1, \dots ,m_2 \\ &\dots \\ &\hat{\beta}_J^* = \mathrm{angularMean}\left(\hat{\beta}_{m_{d-1}+1}, \dots ,\hat{\beta}_{m_d}\right)& J & = m_{d-1} + 1, \dots ,m_d \end{aligned}\]
Then, the algorithm continues to the next step.
When constraints for the \(\omega\) parameters are incorporated, the grid search for the different \(\omega\) values is outside the backfitting loops. When the number of blocks is large, the estimation procedure can be computationally unaffordable. In order to reduce the execution time, a two-nested backfitting algorithm is proposed. In the outer backfitting loop, a block is fitted. In the inner loop, the FMM waves belonging to the same block are estimated. This procedure generates a close to optimal solution and is a less computationally expensive alternative.
The FMM code makes use of the doParallel package (Corporation and Weston 2020) to embed parallelization for the fitting process. Several utilities from the ggplot2 (Wickham 2016) and RColorBrewer (Neuwirth 2014) packages are occasionally necessary for the visualization of the fitted models.
The implementation of FMM is divided into four main
functionalities described in the next four sections: the fitting of the
FMM models, the new S4 object of class "FMM",
the graphical visualization of the fittings and the simulation of
synthetic data.
Some general details about the functions contained in the FMM package are shown in Table 1.
| Function | Description |
|---|---|
| Fitting function | |
fitFMM(vData, timePoints, nback, ...) |
Estimates an FMM\(_{nback}\) model to vData
observed at timePoints. |
| Utility functions | |
plotFMM(objFMM, ...) |
Graphically displays an object of class
"FMM". |
generateFMM(M, A, alpha, beta, omega, ...) |
Simulates values from an FMM model with parameters
\((M =\) M, \(A =\) A, \(\alpha =\) alpha\(, \beta =\) beta\(,\omega =\) omega\()\). |
getFMMPeaks(objFMM, ...) |
Estimates peak and trough times, together with signal values at those times, for each FMM wave. |
extractWaves(objFMM) |
Extracts individual contribution to the fitted values of each FMM wave. |
Standard methods for objects of class
"FMM" |
|
summary(), show(),
coef(), fitted() |
An FMM model can be fitted using the main function
fitFMM(). The description and default values of its inputs
arguments are shown in Table 2.
The fitting function fitFMM() requires the
vData input argument, which contains the data to be fitted.
Two other arguments can be used to control a basic fitting:
timePoints, which contains the specific time points of the
single period; and nback, with the number of FMM components
to be fitted. For some applications, such as the study of circadian
rhythms, data are collected over multiple periods. This information is
received by the fitFMM() function through the input
argument nPeriods. When nPeriods>1, the FMM
fitting is carried out by averaging the data collected at each time
point across all considered periods.
| Argument | Default value | Description |
|---|---|---|
vData |
no default value | A "numeric" vector containing the data to
be fitted by an FMM model. |
nPeriods |
\(1\) | A "numeric" value specifying the number of
periods at which vData is observed. |
timePoints |
NULL |
A "numeric" vector containing the time
points per period at which data is observed. When
timePoints = NULL an equally spaced sequence from \(0\) to \(2\pi\) will be assigned. |
nback |
\(1\) | A "numeric" value specifying the number of
FMM components to be fitted. |
betaOmegaRestrictions |
\(1:\)
nback |
An "integer" vector of length
nback indicating which FMM waves are constrained to have
equal \(\beta\) and \(\omega\) parameters. |
maxiter |
nback |
A "numeric" value specifying the maximum
number of iterations for the backfitting algorithm. |
stopFunction |
alwaysFalse |
Function to check the stopping criterion for the backfitting algorithm. |
lengthAlphaGrid |
\(48\) | A "numeric" value specifying the grid
resolution of the parameter \(\alpha\). |
lengthOmegaGrid |
\(24\) | A "numeric" value specifying the grid
resolution of the parameter \(\omega\). |
numReps |
\(3\) | A "numeric" value specifying the number of
times \(\left(\alpha,\omega\right)\)
parameters are refined. |
showProgress |
TRUE |
TRUE to display a progress indicator on
the console. |
showTime |
FALSE |
TRUE to display execution time on the
console. |
parallelize |
FALSE |
TRUE to use parallelized procedure to fit
a FMM model. |
restrExactSolution |
FALSE |
TRUE to obtain the optimal solution for
the restricted fitting. |
There are three key issues in the fitting process: the grid search of the pair \(\left(\alpha, \omega\right)\) to solve the estimation problem of a single FMM wave, the backfitting algorithm used for the estimation of the multicomponent models, and the incorporation of restrictions on \(\beta\) and \(\omega\) parameters. Each of these issues is controlled by several arguments described below.
lengthAlphaGrid and lengthOmegaGrid arguments
are used to set the grid resolution by specifying the number of equally
spaced \(\alpha\) and \(\omega\) values. Thus, the objective
function will be evaluated a total number of
(lengthAlphaGrid)\(\times\)(lengthOmegaGrid)
times, so when both arguments are large, the computational demand can be
high. By reducing the size of the sequences of the \(\alpha\) and \(\omega\) parameters, the algorithm will be
computationally more efficient. However, it may fail to obtain an
accurate estimation if the grid resolution is too sparse. An implemented
option to fine-tune the estimation of the parameters is to repeat the
fitting process a numReps of times, in such a way that, at
each repetition, a new two-dimensional grid of \(\left(\alpha, \omega\right)\) points is
created around the previous estimates. In addition, the
parallelize argument specifies whether a parallel
processing implementation is used.maxiter sets the maximum number of backfitting iterations.
Through the argument stopFunction, it is possible to set a
stopping criterion. Two criteria have been implemented as stop functions
in this package. When stopFunction = alwaysFalse,
maxiter iterations will be forced. If
stopFunction = R2(), the algorithm will be stopped when the
difference between the explained variability in two consecutive
iterations is less than a value pre-specified in the difMax
argument of R2() function.betaOmegaRestrictions sets the equality constraints for the
\(\beta\) and \(\omega\) parameters. For the unrestricted
case, betaOmegaRestrictions = 1:nback. To add restrictions,
"integer" vectors of length \(m\) can be passed to this argument, so that
positions with the same numeric value correspond to FMM waves whose
parameters, \(\beta\) and \(\omega\), are forced to be equal. Since
restricted fitting can be computationally intensive, a two-nested
backfitting algorithm can be used for the estimation of \(\omega\) parameters when the argument
restrExactSolution = FALSE."FMM"The fitFMM() function outputs an S4 object
of class "FMM" which contains the slots presented in
Table 3.
| Slot | Description |
|---|---|
timePoints |
A "numeric" vector containing the time
points for each data point if one single period is observed. |
data |
A "numeric" vector containing the data to
be fitted to an FMM model. Data could be collected over multiple
periods. |
summarizedData |
A "numeric" vector containing the
summarized data at each time point across all considered periods. |
nPeriods |
A "numeric" value containing the number of
periods in data. |
fittedValues |
A "numeric" vector of the fitted values by
the FMM model. |
M |
A "numeric" value of the estimated
intercept parameter \(M\). |
A |
An \(m\)-element
"numeric" vector of the estimated FMM wave amplitude
parameter(s) \(A\). |
alpha |
An \(m\)-element
"numeric" vector of the estimated FMM wave phase
translation parameter(s) \(\alpha\). |
beta |
An \(m\)-element
"numeric" vector of the estimated FMM wave skewness
parameter(s) \(\beta\). |
omega |
An \(m\)-element
"numeric" vector of the estimated FMM wave kurtosis
parameter(s) \(\omega\). |
SSE |
A "numeric" value of the residual sum of
squares values. |
R2 |
An \(m\)-element
"numeric" vector specifying the explained variance by each
of the fitted FMM components. |
nIter |
A "numeric" value containing the number of
iterations of the backfitting algorithm. |
The standard methods implemented for the class "FMM"
include the functions summary(), show(),
coef() and fitted(). These methods display
relevant information of the FMM fitting, and provide the estimated
parameters and fitted values. In addition, two more specific functions
have been implemented. Through the extractWaves() function,
the individual contribution of each FMM wave to the fitted values can be
extracted. Finally, the location of the peak and trough of each FMM
wave, as well as the value of the signal at these time points, can be
estimated using the getFMMPeaks() function. The required
argument of all these methods and functions is an object of the class
"FMM". Particularly, getFMMPeaks() has an
optional argument: timePointsIn2pi, that forces the peak
and trough locations to be returned into the interval from \(0\) to \(2\pi\) when it is TRUE.
The FMM package includes the function plotFMM()
to visualize the results of an FMM fit. The arguments of this function
are summarized in Table 4.
| Argument | Default value | Description |
|---|---|---|
objFMM |
no default value | The object of class "FMM" to be
plotted. |
components |
FALSE |
TRUE to display a plot of components. |
plotAlongPeriods |
FALSE |
TRUE to plot more than \(1\) period. |
use_ggplot2 |
FALSE |
TRUE to plot with ggplot2
package. |
legendInComponentsPlot |
TRUE |
TRUE to indicate if a legend should be
plotted in the component plot. |
textExtra |
empty string | Extra text to be added to the title of the plot. |
An object of class "FMM" can be plotted in two ways (see
Figure 1). The default graphical representation
will be a plot on which original data (as points) and the fitted signal
(as a line) are plotted together (left panel in Figure 1). The other possible representation is a component
plot for displaying each centered FMM wave separately (right panel in
Figure 1). Set the boolean argument
components = TRUE to show a component plot. When
legendInComponentsPlot = TRUE, a legend appears at the
bottom of the component plot to indicate the represented waves. The
argument textExtra allows an extra text to be added to the
title of both graphical representations.
As mentioned above, in some cases, data are collected from different
periods. All periods can be displayed simultaneously on the default plot
using plotAlongPeriods = TRUE. For the component plot, this
argument is ignored.
The argument use_ggplot2 provides a choice between
building the plot using base R graphics or ggplot2
packages. By default, the graphics package is used. When
use_ggplot2 = TRUE, a more aesthetic and customizable plot
is created using the ggplot2 package.
Data from an FMM model can be easily simulated using the function
generateFMM() of the package FMM. All input
arguments of this function are shown in Table 5, along with a short description and
their default values.
| Argument | Default value | Description |
|---|---|---|
M |
no default value | Value of the intercept parameter \(M\). |
A |
no default value | Vector of the FMM wave amplitude parameter \(A\). |
alpha |
no default value | Vector of the FMM wave phase translation parameter \(\alpha\). |
beta |
no default value | Vector of the FMM wave skewness parameter \(\beta\). |
omega |
no default value | Vector of the FMM wave kurtosis parameter \(\omega\). |
from |
\(0\) | Initial time point of the simulated data. |
to |
\(2\pi\) | Final time point of the simulated data. |
length.out |
\(100\) | Desired length of the simulation. |
timePoints |
seq(from, to, length = length.out) |
Time points at which the data will be simulated. |
plot |
TRUE |
TRUE when the simulated data should be
drawn on a plot. |
outvalues |
TRUE |
TRUE when the numerical simulation should
be returned. |
sigmaNoise |
\(0\) | Standard deviation of the Gaussian noise to be added. |
The main arguments of this function are M,
A, alpha, beta and
omega, whereby the values of the FMM model parameters are
passed to the function. All these arguments are "numeric"
vectors of length \(m\), except
M, which has length \(1\).
Longer and smaller vectors will be truncated or replicated as
appropriate.
By default, the data will be simulated at a sequence of \(100\) equally spaced time points from \(0\) to \(2\pi\). The arguments from,
to and length.out control such sequences. The
sequence can also be manually set using the argument
timePoints, in which case from,
to and length.out will be ignored.
The user can add a Gaussian noise by argument
sigmaNoise. A positive "numeric" value sets
the corresponding standard deviation of the Gaussian noise to be added.
To create the normally distributed noise, the rnorm()
function is used.
The arguments plot and outvalues, both
boolean values, determine the output of the generateFMM()
function. When outvalues = TRUE, a "list" with
input parameters, time points and simulated data is returned. These
elements are named input, t and
y, respectively. In addition, a scatter plot of
y against t can be drawn by setting
plot = TRUE.
The example below, based on FMM synthetic data, illustrates the basic
uses and capabilities of the functions implemented in the FMM
package. A set of \(100\) observations
is simulated from an FMM\(_4\) model
with intercept parameter \(M = 3\),
amplitude parameters: \(A_1 = 4\),
\(A_2 = 3\), \(A_3 = 1.5\) and \(A_4 = 1\), and phase translation
parameters: \(\alpha_1 = 3.8\), \(\alpha_2 = 1.2\), \(\alpha_3 = 4.5\) and \(\alpha_4 = 2\). With regard to the shape
parameters, pairs of waves are equal. Specifically, the shape parameters
satisfy: \[\begin{aligned}
& \beta_1 = \beta_2 = 3 & \omega_1 & =\omega_2 = 0.1\\
& \beta_3 = \beta_4 = 1 & \omega_3 & =\omega_4 = 0.05
\end{aligned}\] The standard deviation of the error term is set
at \(\sigma = 0.3\). We use the
function generateFMM() to simulate this data set. A
set.seed() statement is used to guarantee the
reproducibility of the results.
> library("FMM")
> set.seed(1115)
> rfmm.data <-generateFMM(M = 3, A = c(4,3,1.5,1), alpha = c(3.8,1.2,4.5,2),
+ beta = c(rep(3,2),rep(1,2)),
+ omega = c(rep(0.1,2),rep(0.05,2)),
+ plot = FALSE, outvalues = TRUE,
+ sigmaNoise = 0.3)
The estimation of an FMM\(_4\) can
be performed by setting nback = 4 in the fitting function
fitFMM(). The betaOmegaRestrictions parameter
allows a wide variety of shape restrictions to be incorporated into the
fitting procedure. In this example, to impose the shape restrictiction
on the fitting process, we use
betaOmegaRestrictions = c(1, 1, 2, 2).
> fit.rfmm <- fitFMM(vData = rfmm.data$y, timePoints = rfmm.data$t, nback = 4,
+ betaOmegaRestrictions = c(1, 1, 2, 2))
|--------------------------------------------------|
|==================================================|
Stopped by reaching maximum iterations (4 iteration(s))
The results are displayed by the function summary():
> summary(fit.rfmm)
Title:
FMM model with 4 components
Coefficients:
M (Intercept): 3.1661
A alpha beta omega
FMM wave 1: 4.0447 3.8048 3.0238 0.0930
FMM wave 2: 3.1006 1.1956 3.0238 0.0930
FMM wave 3: 1.6069 4.5228 1.0145 0.0427
FMM wave 4: 1.1194 1.9788 1.0145 0.0427
Peak and trough times and signals:
t.Upper Z.Upper t.Lower Z.Lower
FMM wave 1: 0.6741 5.3198 4.9354 -2.7565
FMM wave 2: 4.3482 3.4702 2.3263 -2.1742
FMM wave 3: 1.5345 -1.2330 1.3338 -4.1527
FMM wave 4: 5.2737 -1.7005 5.0730 -3.7565
Residuals:
Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.719769 -0.162649 0.007025 0.000000 0.160127 0.904218
R-squared:
Wave 1 Wave 2 Wave 3 Wave 4 Total
0.5049 0.3906 0.0531 0.0276 0.9761
The FMM wave parameter estimates, as well as the peak and trough
times, together with the signal values at those times, are presented in
tabular form, where each row corresponds to a component and each column
to an FMM wave parameter. As part of the summary, a brief description of
the residuals, the proportion of variance explained by each FMM
component and by the global model are also shown. The
summary() output can be assigned to an object to get a
"list" of all the displayed results.
Other options to return the results are the functions
coef(), getFMMPeaks() and
resid(). The first two return a "list" similar
to those obtained with summary(). The resid()
method can be used to obtain the complete residuals vector. In addition,
the fitted values can be extracted by the function
fitted(), which returns a "data.frame" with
two columns: time points and fitted values.
The FMM plots can be generated in the R graphics or
ggplot2 packages. In the code example given below, we use
use_ggplot2 = TRUE to build Figure 1
based on ggplot2. The use of ggplot2 makes it easier
to customize our plots and modify features, such as scales, margins,
axes, etc. In Figure 1, the two possible FMM plots
are arranged via the grid.arrange() function of the gridExtra
package (Auguie 2017).
> library("RColorBrewer")
> library("ggplot2")
> library("gridExtra")
> # Plot the fitted FMM model
> titleText <- "Simulation of four restricted FMM waves"
> defaultrFMM2 <- plotFMM(fit.rfmm, use_ggplot2 = TRUE, textExtra = titleText) +
+ theme(plot.margin=unit(c(1,0.25,1.3,1), "cm")) +
+ ylim(-5, 6)
> comprFMM2 <- plotFMM(fit.rfmm, components=TRUE, use_ggplot2 = TRUE,
+ textExtra = titleText) +
+ theme(plot.margin=unit(c(1,0.25,0,1), "cm")) +
+ ylim(-5, 6) +
+ scale_color_manual(values = brewer.pal("Set1",n = 8)[3:6])
> grid.arrange(defaultrFMM2, comprFMM2, nrow = 1)
This section illustrates the use of the FMM package on the
analysis of real signals from chronobiology, electrocardiography and
neuroscience. To do this, the package includes four real-world data sets
in RData format which are described in the following
sections.
Chronobiology studies ubiquitous daily variations found in nature and in many aspects of the physiology of human beings, such as blood pressure or hormone levels (Mermet, Yeung, and Naef 2017). These phenomena commonly display signals with oscillatory patterns that repeat every \(24\) hours, usually known as circadian rhythms. In particular, circadian gene expression data have been deeply analyzed in the literature as they regulate the vast majority of molecular rhythms involved in diverse biochemical and cellular functions, see among others (Zhang et al. 2014), (Cornelissen 2014) and (Larriba et al. 2020).
The FMM package includes a data set called
mouseGeneExp that contains expression data of the
Iqgap2 gene from mouse liver. The liver circadian database is
widely extended in chronobiology since the liver is a highly rhythmic
organ with moderate levels of noise (Anafi et al.
2017; Larriba et al. 2018, 2020). The complete database is freely
available at NCBI GEO (http://www.ncbi.nlm.nih.gov/geo/), with GEO accession
number GSE11923. Gene expression values are given along \(48\) hours with a sampling frequency of
\(1\) hour/\(2\) days. Hence, data are collected along
two periods, and an FMM\(_1\) model is
fitted to the Iqgap2 average expressed values as follows:
> data("mouseGeneExp", package = "FMM")
> fitGene <- fitFMM(vData = mouseGeneExp, nPeriods = 2, nback = 1, showProgress = FALSE)
> summary(fitGene)
Title:
FMM model with 1 components
Coefficients:
M (Intercept): 10.1508
A alpha beta omega
FMM wave 1: 0.4683 3.0839 1.5329 0.0816
Peak and trough times and signals:
t.Upper Z.Upper t.Lower Z.Lower
FMM wave 1: 0.1115 10.6191 6.0686 9.6825
Residuals:
Min. 1st Qu. Median Mean 3rd Qu. Max.
-9.751e-02 -3.490e-02 2.269e-03 -1.530e-06 2.670e-02 1.890e-01
R-squared:
[1] 0.8752
The behavior of the FMM versus COS model to describe this asymmetric pattern has been compared in terms of \(R^2\). The FMM model clearly outperforms the COS one with an \(R^2\) of \(0.8752\) and \(0.2835\), respectively. In addition, a difference of \(4.73\) hours in peak time estimation between both models is observed, the FMM peak estimate being much more reliable, as is shown in Figure 2.
ECG records the periodic electrical activity of the heart. This activity represents the contraction and relaxation of the atria and ventricle, processes related to the crests and troughs of the ECG waveform. Heartbeats are decomposed into five fundamental waves, labelled as \(P\), \(Q\), \(R\), \(S\) and \(T\), corresponding to the different phases of the heart’s electric activity. The main features used in medical practice for cardiovascular pathology diagnosis are related to the location and amplitudes of these waves, and, of them, those labeled as \(P\), \(R\) and \(T\) are of particular interest (Bayes de Luna 2007). Standard ECG signals are registered using twelve leads, calculated from different electrode locations. Lead II is the reference signal, as it usually provides a good view of the main ECG waves (Meek and Morris 2002).
The FMM package includes the analysis of a typical ECG
heartbeat from the QT database (Laguna et al.
1997). This recording, from the subject sel100, belongs
to the Normal category, regarding Physionet’s pathology
classification (Goldberger et al. 2000).
The data illustrate the voltage of the heart’s electric activity,
measured in \(mV\), along the heartbeat
with a sampling frequency of \(250Hz\).
Specifically, the ECG signal from lead II in the fifth of the thirty
annotated heartbeats is analysed. Recordings are publicly available on
(http://www.physionet.org). Data are saved as
ecgData in the package. For an ECG heartbeat, an FMM\(_{ecg}\), a fifth order multicomponent FMM
model can be fitted with the instruction:
> data("ecgData", package = "FMM")
> fitEcg <- fitFMM(ecgData, nback = 5, showProgress = FALSE)
> summary(fitEcg)
Title:
FMM model with 5 components
Coefficients:
M (Intercept): 5.2717
A alpha beta omega
FMM wave 1: 0.6454 5.5151 3.2926 0.0325
FMM wave 2: 0.0994 4.4203 3.7702 0.1356
FMM wave 3: 0.2443 5.3511 0.6636 0.0323
FMM wave 4: 0.3157 5.5919 4.8651 0.0126
FMM wave 5: 0.0666 1.7988 2.1277 0.1632
Peak and trough times and signals:
t.Upper Z.Upper t.Lower Z.Lower
FMM wave 1: 2.3686 6.2370 3.1841 4.7241
FMM wave 2: 1.1905 4.9487 2.0693 4.6897
FMM wave 3: 2.3965 6.0828 2.1872 4.5551
FMM wave 4: 2.4210 5.7933 2.4719 4.7175
FMM wave 5: 5.1212 4.8646 4.3689 4.7228
Residuals:
Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.0690885 -0.0095597 -0.0001127 0.0000000 0.0098533 0.0623569
R-squared:
Wave 1 Wave 2 Wave 3 Wave 4 Wave 5 Total
0.7645 0.0920 0.0581 0.0493 0.0278 0.9918
It is worth noting that the FMM package not only provides ECG signal-fitting (the left hand panel in Figure 3), but it also does wave decomposition and fiducial mark annotations on the desired waves (the right hand panel in Figure 3). It is clearly visible how the specific shapes of the five main waves contribute to drawing and explaining the lead II ECG waveform from the Normal morphology. See (Rueda, Larriba, and Lamela 2021) for a complete review of FMM\(_{ecg}\).
The study of the electrophysiological activity of neurons is one of the main research branches in neuroscience. The AP curves are oscillatory signals that serve as basic information units between neurons. They measure the electrical potential difference between inside and outside the cell due to an external stimulus. (Gerstner et al. 2014) can serve as a basic reference for electrophysiological neuroscience. Recently, the shape and other features of the AP have been used in problems such as spike sorting (Rácz et al. 2020; Souza et al. 2019; Caro-Martı́n et al. 2018) or neuronal cell type classification (Teeter et al. 2018; Gouwens et al. 2019; Mosher et al. 2020; Rodrı́guez-Collado and Rueda 2021b).
The package includes an example of a neuronal AP. The data were simulated with the renowned Hodgkin-Huxley model, first presented in (Hodgkin and Huxley 1952), which is defined as a system of ordinary differential equations and has been used in a wide array of applications, as it successfully describes the neuronal activity in various organisms. The simulation has been done using a modified version of the python package NeuroDynex available at (Gerstner et al. 2014). More concretely, a short square stimulus of \(12 \mu A\) has been applied to the neuron. The data can be accurately fitted by an FMM\(_2\) model as follows:
> data("neuronalSpike", package = "FMM")
> fitSingleAP <- fitFMM(neuronalSpike, nback = 2, showProgress = FALSE)
> summary(fitSingleAP)
Title:
FMM model with 2 components
Coefficients:
M (Intercept): 44.9474
A alpha beta omega
FMM wave 1: 52.9014 4.4160 3.0606 0.0413
FMM wave 2: 18.5046 4.6564 4.9621 0.0322
Peak and trough times and signals:
t.Upper Z.Upper t.Lower Z.Lower
FMM wave 1: 1.2777 110.8361 5.9669 -2.5002
FMM wave 2: 1.4319 36.9084 1.5649 -16.2572
Residuals:
Min. 1st Qu. Median Mean 3rd Qu. Max.
-14.3012 -1.0038 0.7472 0.0000 1.3230 24.8618
R-squared:
Wave 1 Wave 2 Total
0.9064 0.0604 0.9669
The goodness of fit of the FMM\(_2\) model can be ascertained in Figure 4. For comparison purposes, an FD model has been fitted with the same number of degrees of freedom. While the FD attains an \(R^2 = 0.3926\), the FMM model achieves a better fit with \(R^2 = 0.9669\).
Multiple AP curves, denominated spike or AP train, are usually observed as the response to a stimulus. Various models, such as the widely used leaky-and-fire models (Lynch and Houghton 2015), cut the signal into segments, each one containing an AP curve. Some authors suggest cutting the signal into even segments (Gerstner et al. 2014). However, the length of the segments turns out to be significantly different between different types of neurons, as explained in (Teeter et al. 2018), and unequal data segments can lessen the utility of some approaches. An important aspect to take into account is that the shape of the APs in the spike train is considered to be similar and, consequently, a restricted FMM model can accurately fit the entire signal.
The FMM package includes the data of a spike train composed of three AP curves. The proposed model for use with these data is an FMM\(_{ST}\) model, as defined in (Rodrı́guez-Collado and Rueda 2021a). Each AP is modeled by two components. The \(\beta\) and \(\omega\) parameters are constrained between AP curves. The code below fits the model.
> data("neuronalAPTrain", package = "FMM")
> nAPs <- 3; restriction <- c(rep(1,nAPs),rep(2,nAPs))
> fitAPTrain<-fitFMM(neuronalAPTrain, nback = nAPs*2,
betaRestrictions = restriction,
omegaRestrictions = restriction,
showProgress = FALSE, parallelize=TRUE)
> summary(fitAPTrain)
Title:
FMM model with 6 components
Coefficients:
M (Intercept): 135.4137
A alpha beta omega
FMM wave 1: 51.7069 6.1358 2.8172 0.0384
FMM wave 2: 52.0915 1.7541 2.8172 0.0384
FMM wave 3: 51.1140 4.2319 2.8172 0.0384
FMM wave 4: 20.3725 4.4778 4.8637 0.0552
FMM wave 5: 19.2429 1.9981 4.8637 0.0552
FMM wave 6: 19.6748 0.0973 4.8637 0.0552
Peak and trough times and signals:
t.Upper Z.Upper t.Lower Z.Lower
FMM wave 1: 3.0067 111.4319 2.5332 -1.2051
FMM wave 2: 4.9082 111.7323 4.4347 -1.4607
FMM wave 3: 1.1028 111.2700 0.6293 -0.1561
FMM wave 4: 1.2077 58.5986 1.4310 -14.2508
FMM wave 5: 5.0113 58.5537 5.2345 -13.3010
FMM wave 6: 3.1104 58.4889 3.3337 -13.7041
Residuals:
Min. 1st Qu. Median Mean 3rd Qu. Max.
-14.8618 -1.4929 0.5029 0.0000 1.6021 19.1978
R-squared:
Wave 1 Wave 2 Wave 3 Wave 4 Wave 5 Wave 6 Total
0.2524 0.2881 0.3501 0.0244 0.0276 0.0413 0.9839
In Figure 5, the fit of the FMM\(_{ST}\) model can be visualized. The goodness of fit of the model is excellent, achieving an \(R^2 = 0.9839\).
A general overview on the R package FMM, which implements the estimation of FMM models, is provided in this paper. The flexibility offered by these models to fit oscillatory signals of many different shapes makes them a very useful tool to model complex rhythmic patterns. The FMM methodology and its application to very diverse biological data has been described in previous papers (Rueda, Larriba, and Peddada 2019; Rueda, Larriba, and Lamela 2021; Rueda, Rodríguez-Collado, and Larriba 2021) and recently revised in (Rueda et al. 2021).
The package allows both single and multicomponent FMM models to be estimated. In order to provide greater flexibility, equality constraints for shape parameters have also been implemented. In addition, graphical representations of the fitted models and the possibility of generating synthetic data are available. The functionality of the package has been illustrated by simulated data and also by real examples from different areas of application related to present-day biological problems. The latest release of the FMM package is publicly available on CRAN (http://CRAN.R-project.org/package=FMM). A development version is also provided via GitHub at https://github.com/alexARC26/FMM where code contributions and bugs can be reported.
Possible future extensions of the FMM package include the implementation of additional restrictions to suit the model to other real signals; the possibility to include weights that determine how much each observation influences the parameter estimates; and the choice of an optimization technique, other than the Neldel-Mead method, in the estimation algorithm.
This research was partially supported by the Spanish Ministerio de Economía y Competitividad, grant PID2019-106363RB-I00, as well as the Call for predoctoral contracts of the UVa 2020. The authors thank the editors and two anonymous reviewers for their constructive comments that helped to improve this paper and the described package.