Abstract
Longitudinal studies are ubiquitous in medical and clinical research. Sample size computations are critical to ensure that these studies are sufficiently powered to provide reliable and valid inferences. There are several methodologies for calculating sample sizes for longitudinal studies that depend on many considerations including the study design features, outcome type and distribution, and proposed analytical methods. We briefly review the literature and describe sample size formulas for continuous longitudinal data. We then apply the methods using example studies comparing treatment versus control groups in randomized trials assessing treatment effect on clinical outcomes. We also introduce a Shiny app that we developed to assist researchers with obtaining required sample sizes for longitudinal studies by allowing users to enter required pilot estimates. For Alzheimer’s studies, the app can estimate required pilot parameters using data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI). Illustrative examples are used to demonstrate how the package and app can be used to generate sample size and power curves. The package and app are designed to help researchers easily assess the operating characteristics of study designs for Alzheimer’s clinical trials and other research studies with longitudinal continuous outcomes. Data used in preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu).Longitudinal designs are generally preferred over cross-sectional research designs as they provide richer data and greater statistical power. As such, many biomedical and medical studies employ longitudinal designs to study changes over time in outcomes at the individual, group, or population level. Early in the design of a longitudinal experimental or natural history study, it is imperative to ensure that the study is adequately powered for its aims. Inadequate sample sizes leads to invalid or inconclusive inference and squandered resources (Lu, Mehrotra, and Liu 2009; Yan and Su 2006). On the other hand, oversampling squanders resources and exposes participants to unnecessary risks associated with the intervention (Lu, Mehrotra, and Liu 2009). Thus, optimal sample size and power analysis have become important prerequisites for any quantitative research design. Not only are these required during the design phase of research, but it has also become mandatory for ethical, scientific, or economic justification in submissions to institutional review boards and research funding agencies.
Determining the right sample size for a study is not a straightforward task. Despite the plethora of sample size formulas for repeated measures (Overall and Doyle 1994; Lui 1992; Rochon 1991; Y. Guo, Logan, and Glueck 2013), cluster repeated measures (A. Liu, Shih, and Gehan 2002), multivariate repeated measures (Vonesh and Schork 1986; X. Guo and Johnson 1996), longitudinal research designs (Lefante 1990), the tasks of gathering pilot estimates of the necessary parameters and getting the right software to carry out the computation can be challenging. Researchers commonly rely on formulas for very basic cross-sectional studies and adjust for attrition and other longitudinal design effects. Although such approaches can yield appropriate approximations, the ideal approach is to use a formula derived directly from the longitudinal model that researchers plan to eventually use on the study data.
(Y. Guo, Logan, and Glueck 2013) describe practical methods for the selection of appropriate sample size for repeated measures addressing issues of missing data, and the inclusion of more than one covariate to control for differences in response at baseline. Sample size formulas are refined depending on the specific situation and design features. For example, (Hedeker, Gibbons, and Waternaux 1999) considered a sample size for longitudinal designs comparing two groups that accounted for participant attrition or drop-out. (Basagaña, Liao, and Spiegelman 2011) derived sample size formulas for continuous longitudinal data with time-varying exposure variables typical of observational studies. Ignoring time-varying exposure was demonstrated to lead to substantial overestimation of the minimum required sample size which can be economically disadvantageous. In non-traditional longitudinal designs such as designs for mediation analysis of the longitudinal study, further refinements to sample size formulas are needed to ensure that sufficient sample sizes are obtained (Pan et al. 2018). However, these formulas usually require additional parameters such as exposure mean, variance, and intraclass correlations (Basagaña, Liao, and Spiegelman 2011), mediation effect, number of repeated measures (Pan et al. 2018), covariance structures (Rochon 1991), non-linear trends (Yan and Su 2006), missing, attrition or dropout rates (Roy et al. 2007; Lu, Luo, and Chen 2008), among others. Advanced sample size methods simultaneously handle several practical issues associated with the research design and complications that may arise during data collection. However, such methods are only available in commercial software (NCSS Statistical Software 2021; nQuery 2021).
Several R packages can be found on CRAN to compute sample size based on mixed-effect models and other specific designs depending on the area of applications. For example, (Martin et al. 2011) proposed a simulation-based power calculation and an R package pamm for random regression models, a specific form of mixed-effect model that detects significant variation in individual or group slopes. In their approach, a power analysis was performed to detect a specified level of individual and environmental interactions within evolution and ecology applications. This is achieved by simulating power to detect a given covariance structure. Other simulation-based packages for power analysis are the SIMR by (Green and MacLeod 2016) for linear and generalized linear mixed models and clusterPower by (Kleinman 2021) for cluster-randomized and cross-over designs. (Schoenfeld 2019) developed a power and sample size package called LPower (P. Diggle, Liang, and Zeger 1994) to perform power analysis for longitudinal design accounting for attrition and different random effect specification. The approach requires the specification of a design matrix, and the variance-covariance matrix of the repeated measures (Yi and Panzarella 2002). In pharmacokinetic study designs, (Kloprogge and Tarning 2015) developed the PharmPow power calculation package for mixed study designs including crossover and parallel designs. Quite recently, other R packages for performing power analysis exist for different designs; for example the powerMediation (Qiu 2021) for mediation effect, mean change for longitudinal study with 2-time points, the slope for simple Poisson regression, etc.; powerEQTL (Dong et al. 2021) for unbalanced one-way ANOVA in a Bulk Tissue and Single-Cell eQTL Analysis; WebPower (Zhang, Mai, and Yang 2021) for basic and advanced power correlation, proportion, \(t\)-test, one-way ANOVA, two-way ANOVA, linear regression, logistic regression, Poisson regression, mediation analysis, longitudinal data analysis, structural equation modeling, and multilevel modeling; among others. These packages were tailored towards very specific areas of application although the methods can be adapted and utilized for other disciplines. In terms of software applications, attempts have been made in recent times to implement power and sample size calculations in Shiny applications to facilitate easy usage. For example, (Hemming et al. 2020) developed an R Shiny app to conduct power analysis for several cluster designs including parallel, cross-over, and stepped-wedge designs. (Schoemann, Boulton, and Short 2017) Shiny application conducts power analysis for mediation model, and (Hu and Qu 2021) performs sample size and power calculations for a random coefficient regression model (RCRM) and a two-stage mixed-effects model.
As the analysis model and associated sample size formula become more sophisticated, estimating the parameters required by the formula becomes more challenging. A major hurdle to overcome is the availability of pilot data to inform these parameters. To assist Alzheimer’s researchers with this challenge, we have developed a power and sample size Shiny app for Alzheimer’s clinical trials. The app implements formulas for the linear mixed-effects model and mixed model for repeated measures [MMRM; Lu, Luo, and Chen (2008)] allowing the user to input their pilot estimates, or allow the app to generate pilot estimates using data from the Alzheimer’s Disease Neuroimaging Initiative [ADNI; Weiner et al. (2015)].
Continuous outcomes in clinical trial data collected longitudinally over time are commonly analyzed using linear mixed models [LMM; Laird and Ware (1982)] or MMRM (Mallinckrodt, Clark, and David 2001; Mallinckrodt et al. 2003). Before such trials, it is necessary to estimate the required sample size for a given treatment effect with desired power and Type I error. Various sample size approaches for longitudinal data have been proposed. We review a few of the most commonly used methods applied in Alzheimer’s disease trials with continuous outcomes.
Several sample size approaches have been developed by different authors. (P. J. Diggle et al. 2002) proposed sample size formulas for two approaches to continuous longitudinal outcomes, one that assumes a constant mean over time and compares the average response over time between groups, and another that assumes a linear change over time and compares the mean rate of change or slope difference between groups. In either case, the null hypothesis is that there is no difference between groups. Suppose that \(Y_{ij}\) is the response for participant \(i\) taken at time \(j\). To compare the average response between two groups (e.g. group \(A\) receiving active treatment and group \(B\) receiving control) across time, consider the model \[Y_{ij}=\beta_0+\beta_1T_i+\epsilon_{ij}\] where \(T_i\) is the treatment indicator which is coded 1 for participants in group \(A\) and 0 for participants in group \(B\), and \(\epsilon_{ij}\) the error term for subject, \(i\) at time \(j\), which is normally distributed with common variance, \(\sigma^2\). The sample size required per group to detect an average response between the two groups, \(\delta\), is given by \[m=\frac{2(Z_{1-\alpha/2}+Z_{P})^2\sigma^2(1+(J-1)\rho)}{J\delta^2}\] where \(\alpha\) is the type I error rate, \(P\) is the level of power, and \(corr(Y_{ij},Y_{ik})=\rho\) for all \(j\neq k\).
To test difference in the slopes or average rate of change between the two groups, consider the parameterization: \[Y_{ij}=\beta_{0A}+\beta_{1A}t_{ij}+\epsilon_{ij}, i=1,2,\dots,m; j=1,2,\dots,J.\] A similar parameterization can be specified for group \(B\) with parameters \(\beta_{0B}\), and \(\beta_{1B}\) representing the intercept and slope. The parameters \(\beta_{1A}\) and \(\beta_{1B}\) are the rate of change of the outcome for groups \(A\) and \(B\), respectively. Assume that \(var(\epsilon_{ij})=\sigma^2\) and \(corr(Y_{ij},Y_{ik})=\rho\) for all \(j\neq k\). If the outcome of each participant is measured at the same time, \(t_j\), then the sample size per group needed to detect a difference in the rate of change for each group, \(\Delta=\beta_A-\beta_B\), is given by \[m=\frac{2(Z_{1-\alpha/2}+Z_{P})^2\sigma^2(1-\rho)}{Js^2_t\Delta^2}\] where \(\alpha\) is the type I error rate, \(P\) is the power, and \(s^2_t=\frac{\sum (t_j-\bar{t})}{J}\) is the within-participant variance of \(t_j\).
Another sample size computation approach for correlated data is derived by (G. Liu and Liang 1997) to detect differences in the average response between two groups. This approach derived sample size following the generalized estimating equation (Liang and Zeger 1986) approach. Thus, different outcomes types can be handled. A special case is for a continuous response measured repeatedly over time and modeled using a linear model. Consider the model \[Y_{ij}=\beta_0+\beta_1T_i+\epsilon_{ij}, i=1,2,\dots, N,j=1,2,\dots,J\] where \(T_i\) is the treatment indicator, \(\epsilon_{ij}\) is the error term assumed to follow a multivariate normal distribution with a mean vector of zeros, \({\mathbf{0}}\), and covariance matrix, \(\sigma^2 {\mathbf{R}}\) and \({\mathbf{R}}\) is a \(J\times J\) working correlation matrix. Different covariance structures such as independent, exchangeable, auto-regressive, and unstructured can be assumed. To estimate the total sample size required, \(N\) for a given significance level (\(\alpha\)) and pre-specified nominal power, \(P\), the following formula is used: \[N=\frac{\left(Z_{1-\alpha/2}+Z_{P}\right)^2\sigma^2}{\pi_A\pi_B(\beta_1^2 {\mathbf{1}} {\mathbf{R}} ^{-1} {\mathbf{1}})}\] where \(\pi_A\) and \(\pi_B\) represent the proportion of observations in the control and treated groups, respectively. By assuming an exchangeable working correlation matrix, this formula reduces to \[N=\frac{\left(Z_{1-\alpha/2}+Z_{P}\right)^2\sigma^2[1+(J-1)\rho]}{\pi_A\pi_B\beta_1^2J}.\] It can be observed that for equal proportion of participants in each group i.e. \(\pi_A=\pi_B=\frac{1}{2}\), the formula for the sample size per group is equivalent to the sample size formula for difference in average response between groups by (P. J. Diggle et al. 2002) above. An appealing aspect of (G. Liu and Liang 1997) approach is that it can allow for unequal allocation to the two groups, and different outcome types are considered.
Another approach is based on the LMM analysis (Fitzmaurice, Laird, and Ware 2004) comparing mean rate of change between groups (Ard and Edland 2011; Zhao and Edland 2020). Consider the LMM \[Y_{ij}=X'_{ij} {\mathbf{\beta }}+b_{0i}+b_{1i}t_{ij}+\epsilon_{ij}\] where \(X_{ij}\) denotes a vector of indicator variables (treatment, time, and interaction terms), \({\mathbf{b}} _i=(b_{0i},b_{1i})'\sim N\left(\left(\begin{array}{cc} 0\\0 \end{array}\right),\left[\begin{array}{cc} \sigma^2_{b_0}&\sigma_{b_{0},b_{1}}\\ \sigma_{b_{0},b_1}&\sigma^2_{b_1} \end{array}\right]\right)\) are random participant-specific effects, and \(\epsilon_{ij}\sim N(0,\sigma^2_e)\) are the residual errors. The random terms, \({\mathbf{b}} _i\) and \(\epsilon_{ij}\) are assumed to be independent of each other. Following this model specification, (Ard and Edland 2011) proposed a total sample size formula given by \[N=\frac{4\left(Z_{1-\alpha/2}+Z_{P}\right)^2}{\Delta^2}\left[\sigma^2_{b_1}+\frac{\sigma^2_e}{\sum (t_j-\bar{t})^2}\right]\] where \(t_j\) indexes the times at which measures are made assuming all participants are observed at the same time points, \(\bar{t}\) is the mean of the times, \(\sigma^2_{b_1}\) is the variance of the participant-specific random slope, and \(\Delta\) is the difference in mean rate of change in treatment versus control group to be detected. For a random-intercept mixed-effect model, the total sample size formula reduces to \[N=\frac{4\left(Z_{1-\alpha/2}+Z_{P}\right)^2}{\Delta^2}\left[\frac{\sigma^2_e}{\sum (t_j-\bar{t})^2}\right].\] This sample size formula does not account for the correlation between the random intercept and slope, a limitation that can be mitigated by a recent approach by (Hu, Mackey, and Thomas 2021).
An alternative approach is to treat time as a categorical variable. This approach, common in trials of treatments of Alzheimer’s and other therapeutic areas, is often referred to as the MMRM (Mallinckrodt, Clark, and David 2001; Mallinckrodt et al. 2003; Lane 2008). The approach to fitting the MMRM is similar to that for other linear mixed-effect models for longitudinal or repeated measures except the unstructured modelling of time – treated as a categorical variable, and the specification of a within-participant correlation structure to account for association among the repeated measurements. The MMRM provides an estimate of the mean response for each time point category, for each group, and the resulting mean trajectory over time is unconstrained. The primary test statistic is usually the estimated group difference at the final time point. The null hypothesis is again that there is no difference between groups.
Suppose that \(Y_{ij}\) is the \(j\)th measurement taken of participant \(i; i=1,2,\dots,N\) at time, \(t_{ij}\) and \(T_i\) represents treatment or group indicator. Assuming we have two groups, \(a; =(A,B)\), let \(n_{a_j}\) be the number of participants at time \(j=1,2,\dots,J\) for group \(a\). Thus, at time point 1, the total number of participants is \(n_{A1}+n_{B1}\). The response variable is assumed to follow a multivariate normal distribution with mean, variance-covariance and correlation matrix given respectively as \[\begin{aligned} E(Y_{ij}|T_i=a)&=&\mu_{aj}\\ cov(Y_{ij}, Y_{ik}|T_i=a)&=&\sigma_{ajk}= {\mathbf{\Sigma }}_a\\ corr(Y_{ij}, Y_{ik}|T_i=a)&=&\frac{\sigma_{jk}}{\sigma_j\sigma_k}= {\mathbf{R}} \end{aligned}\] where \(\sigma_j=\sigma_{jj}^{\frac{1}{2}}\). It is common to assume that the variance-covariance matrix is the same for each group in which case we can have \({\mathbf{\Sigma }}_A= {\mathbf{\Sigma }}_B= {\mathbf{\Sigma }}\). Different correlation structures, \({\mathbf{R}}\), can be assumed (eg., compound symmetry, autoregressive and unstructured). The sample size formula for MMRM (Lu, Luo, and Chen 2008) can account for the attrition rate at each time point. Define the attrition rate \(\zeta_{aj}=1-r_{aj}\), where \(r_{aj}=\frac{n_{aj}}{n_{a1}}\) is the retention rate at time point, \(j\). The sample size for group \(A\) needed to achieve a power, \(P\), at a significance level, \(\alpha\) is given by \[n_{A1}=(\psi_A+\lambda\psi_B)(Z_{1-\alpha/2}+Z_{P})^2\frac{\sigma^2_J}{\delta_J}\] where \(\lambda=\frac{n_{A1}}{n_{B1}}\) is called the allocation ratio, \(\delta_J\) is the effect size (difference in mean response between the two groups) at the last time point, \(J\), and \(\psi_a\) is the \((J, J)\)th component of the variance inflation factor, \(I_a^{*-1}\) defined with respect to an estimation of the \(\mu_{aj}\) for treatment group, \(a\). Specifically, \(\psi_a=\left[I_a^{*-1}\right]_{JJ}\) where \[I_a^*=\sum_{j=1}^{J}(r_{a,j}-r_{a,j+1})\begin{bmatrix} {\mathbf{R}} _j^{-1}&0_{j\times (J-j)}\\0_{(J-j)\times j}&0_{(J-j)\times (J-j)} \end{bmatrix}\] The total sample size at the first time point is therefore given by \[n_{A1}+n_{B1}=(1+\lambda^{-1})(\psi_A+\lambda\psi_B)(Z_{1-\alpha/2}+Z_{P})^2\frac{\sigma^2_J}{\delta_J}.\]
The aforementioned sample size approaches have been implemented in an R package, longpower (Donohue and Edland 2020), that can be found on CRAN via the URL https://cran.r-project.org/web/packages/longpower/index.html. The package also contains functions which translate pilot mixed effect model parameters (e.g. random intercept and/or slope) into marginal model parameters so that the formulas of (P. J. Diggle et al. 2002) or (G. Liu and Liang 1997) or (Lu, Luo, and Chen 2008) formula can be applied to produce sample size calculations for two sample longitudinal designs assuming known variance.
The interactive Shiny (Chang et al. 2021) application available from the URL https://atrihub.shinyapps.io/power/ is an interface to the longpower package developed to easily generate sample size and conduct power analysis for a longitudinal study design with two-group comparisons for a continuous outcome. The app similarly implements the sample size formula of (G. Liu and Liang 1997) and (P. Diggle, Liang, and Zeger 1994; P. J. Diggle et al. 2002) using functions in the longpower package. A novel feature of the app is that it can generate required pilot estimates by sourcing data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI).
ADNI is a population-based longitudinal cohort study that follows study participants to collect data on their clinical, cognitive, imaging including MRI and PET images, genetic, and biochemical biomarkers. The study was designed to discover, optimize, standardize, and validate clinical trial measures and biomarkers that are used in AD clinical research. This multi-site longitudinal study runs at about 63 sites in the US and Canada and began in 2004. All the data generated from the ADNI study are entered into a data repository hosted at the Laboratory of Neuroimaging (LONI) at the University of Southern California, the LONI Image & Data Archive (IDA). The data can be freely accessed upon request. Apart from the many uses of the data for advancing knowledge for AD trials (Weiner et al. 2015), this big data resource can be used to improve study design. Specifically, in this paper, the data is used to generate pilot estimates for the computation of sample size and power.
We consider an Alzheimer’s disease example using ADAS-Cog (Rosen, Mohs, and Davis 1984) pilot estimates
from the ADNI database. Suppose that we want to compute the sample size
required to detect an effect of 1.5 at the 5% level of significance and
80% power. The LMM fit to ADNI data has an estimated variance of random
slope for group A (placebo or control group) of 74, and a residual
variance of 10. Assuming study visits at (0.00, 0.25, 0.50, 0.75, 1.00,
1.25, 1.50) years, the sample size using the ‘edland’ approach can be
obtained by using the edland.linear.power() functions in
the longpower R package as follows.
> t = seq(0,1.5,0.25)
> edland.linear.power(delta=1.5, t=t, sig2.s = 24, sig2.e = 10, sig.level=0.05,
power = 0.80)
Zhao and Edland, in process
N = 414.6202
n = 207.3101, 207.3101
delta = 1.5
t = 0.00, 0.25, 0.50, 0.75, 1.00, 1.25, 1.50
p = 0, 0, 0, 0, 0, 0, 1
p_2 = 0, 0, 0, 0, 0, 0, 1
sig2.int = 0
sig.b0b1 = 0
sig2.s = 24
sig2.e = 10
sig2.int_2 = 0
sig.b0b1_2 = 0
sig2.s_2 = 24
sig2.e_2 = 10
sig.level = 0.05
power = 0.8
alternative = two.sided
NOTE: N is *total* sample size and n is sample size in *each* group
An alternative approach is to use lmmpower() and specify
the argument method="edland" in the following way.
> lmmpower(delta=1.5, t=t, sig2.s = 24, sig2.e = 10, sig.level=0.05,
power = 0.80,method="edland")
Zhao and Edland, in process
N = 414.6202
n = 207.3101, 207.3101
delta = 1.5
t = 0.00, 0.25, 0.50, 0.75, 1.00, 1.25, 1.50
p = 0, 0, 0, 0, 0, 0, 1
p_2 = 0, 0, 0, 0, 0, 0, 1
sig2.int = 0
sig.b0b1 = 0
sig2.s = 24
sig2.e = 10
sig2.int_2 = 0
sig.b0b1_2 = 0
sig2.s_2 = 24
sig2.e_2 = 10
sig.level = 0.05
power = 0.8
alternative = two.sided
NOTE: N is *total* sample size and n is sample size in *each* group
The Diggle and Liu & Liang approaches can be applied with the
diggle.linear.power() and
lui.liang.linear.power() functions, respectively. The
lmmpower() functions can be used for either approach with
the appropriate specification of the ‘method’ argument.
The second illustrative example is the hypothetical clinical trial discussed in (P. J. Diggle et al. 2002). Suppose that we are interested in testing the effect of a new treatment in reducing blood pressure through a clinical trial. The investigator is interested in randomizing participants between a control and active treatment group to have equal size. Three visits are envisaged with assessments planned at years 0, 2, and 5. Thus, \(n=3\) and \(s^2_x=4.22\). Assuming a type I error rate of 5%, power of 80% and smallest meaningful difference of 0.5 mmHg/year, we want to determine the number of participants needed for both treated and control groups for varying correlations (0.2, 0.5 and 0.8) and response variance (100, 200, 300).
> n <- 3
> t <- c(0,2,5)
> rho <- c(0.2, 0.5, 0.8)
> sigma2 <- c(100, 200, 300)
> tab = outer(rho, sigma2,
+ Vectorize(function(rho, sigma2){
+ ceiling(diggle.linear.power(
+ delta=0.5,
+ t=t,
+ sigma2=sigma2,
+ R=rho,
+ alternative="one.sided",
+ power = 0.80)$n[1])}))
> colnames(tab) = paste("sigma2 =", sigma2)
> rownames(tab) = paste("rho =", rho)
> tab
sigma2 = 100 sigma2 = 200 sigma2 = 300
rho = 0.2 313 625 938
rho = 0.5 196 391 586
rho = 0.8 79 157 235
The above code reproduces the table on page 29 of (P. J. Diggle et al. 2002). We also reproduce the table on page 30 of (P. J. Diggle et al. 2002) for detecting a difference in average response between two groups through the method by (G. Liu and Liang 1997) as follows:
> u = list(u1 = rep(1,n), u2 = rep(0,n)) #a list of covariate vectors or
matrices associated with the parameter of interest
> v = list(v1 = rep(1,n), v2 = rep(1,n)) #a respective list of covariate
vectors or matrices associated with the nuisance parameter
> rho = c(0.2, 0.5, 0.8) #correlations
> delta = c(20, 30, 40, 50)/100 #effect size
> tab = outer(rho, delta,Vectorize(function(rho, delta){
+ ceiling(liu.liang.linear.power(
+ delta=delta, u=u, v=v,
+ sigma2=1,
+ R=rho, alternative="one.sided",
+ power=0.80)$n[1])}))
> colnames(tab) = paste("delta =", delta)
> rownames(tab) = paste("rho =", rho)
> tab
delta = 0.2 delta = 0.3 delta = 0.4 delta = 0.5
rho = 0.2 145 65 37 24
rho = 0.5 207 92 52 33
rho = 0.8 268 120 67 43
The sample size formula for the MMRM approach is also implemented in
the longpower package’s power.mmrm() function. To
illustrate how this approach is implemented, consider a hypothetical
example with a correlation matrix having 0.25 as off-diagonal entries
(exchangeable), a retention vector (1, 0.90,0.80,0.70) and standard
deviation of 1, for group A. Assuming these values to be the same for
group B, then the sample size required to detect an effect size of 0.5
at 5% level of significance and 80% power is computed as follows:
> Ra <- matrix(0.25, nrow = 4, ncol = 4)
> diag(Ra) <- 1 #exchangeable correlation matrix for group A
> ra <- c(1, 0.90, 0.80, 0.70)#retention in group A
> sigmaa <- 1 #standard deviation for group A
> power.mmrm(Ra = Ra, ra = ra, sigmaa = sigmaa, delta = 0.5, power = 0.80)
Power for Mixed Model of Repeated Measures (Lu, Luo, & Chen, 2008)
n1 = 86.99175
n2 = 86.99175
retention1 = 1.0, 0.9, 0.8, 0.7
retention2 = 1.0, 0.9, 0.8, 0.7
delta = 0.5
sig.level = 0.05
power = 0.8
alternative = two.sided
Suppose the allocation ratio is 2, then the function argument
lambda=2 can be added.
We demonstrate how the app is used to perform power analysis by inputting user-specified values and a specific sample size approach. We make use of the values shown in Table 1. For the MMRM method, we specify the options assuming an exchangeable correlation structure.
| Parameter | LMM | MMRM |
|---|---|---|
| Start time | 0 | 1 |
| End time | 1.5 | 4 |
| Timestep | 0.5 | 1 |
| Type I error rate | 0.05 | 0.05 |
| Effect size | 1.5 | - |
| Estimate of variance of random intercept | 55 | - |
| Estimate of variance of random slope | 22 | - |
| Estimate of covariance of random intercept and slope | 29 | - |
| Estimate of the error variance | 10 | - |
| Standard deviation of observation in group A | - | 1 |
| Standard deviation of observation in group B | - | 1 |
| Exchangeable correlation | - | 0.25 |
| Allocation ratio | 1 | 2 |
For the ADNI-based pilot estimate generator, we select the full range of values for some selected variables (See Figure 3).
In this manuscript, we have presented the longpower R package, and a Shiny app dashboard that facilitates sample size and power analysis for a longitudinal study design with two-group comparisons of a continuous outcome. The app implements the sample size formulas of (G. Liu and Liang 1997), (P. Diggle, Liang, and Zeger 1994; P. J. Diggle et al. 2002), (Lu, Luo, and Chen 2008), and (Ard and Edland 2011) using functions in the longpower package. The package also handles models in which time is treated either as continuous (e.g. with random intercepts and slopes) or categorical (MMRM).
The longpower package was created to allow R users easy access to sample size formulas for longitudinal data that were already available in the literature. Many of the earlier papers on the topic provided no software, and so considerable effort was required by each reader to program implementations of the formulas. Collecting these formulas into an R package makes the methods more accessible and easy to compare. The package includes unit tests to ensure the software can adequately reproduce published results, and alternative approaches for the same study design are validated against each other.
A novel feature of the app is the ability to source pilot data for Alzheimer’s disease trials to generate required parameter estimates. We focus on Alzheimer’s data as our primary area of interest, but future work could bring in data from other disease areas. Other future directions include accommodating other outcome types, and keeping up with the evolving landscape of model parameterizations.
We are grateful to the ADNI study volunteers and their families. This
work was supported by Biomarkers Across Neurodegenerative Disease
(BAND-14-338179) grant from the Alzheimer’s Association, Michael J. Fox
Foundation, and Weston Brain Institute; and National Institute on Aging
grant R01-AG049750. Data collection and sharing for this project was
funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI)
(National Institutes of Health Grant U01 AG024904) and DOD ADNI
(Department of Defense award number W81XWH-12-2-0012). ADNI is funded by
the National Institute on Aging, the National Institute of Biomedical
Imaging and Bioengineering, and through generous contributions from the
following: AbbVie, Alzheimer’s Association; Alzheimer’s Drug Discovery
Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-Myers
Squibb Company; CereSpir, Inc.; Eisai Inc.; Elan Pharmaceuticals, Inc.;
Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche Ltd and its
affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO
Ltd.; Janssen Alzheimer Immunotherapy Research & Development, LLC.;
Johnson & Johnson Pharmaceutical Research & Development LLC.;
Lumosity; Lundbeck; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.;
NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals
Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda
Pharmaceutical Company; and Transition Therapeutics. The Canadian
Institutes of Health Research is providing funds to support ADNI
clinical sites in Canada. Private sector contributions are facilitated
by the Foundation for the National Institutes of Health (www.fnih.org).
The grantee organization is the Northern California Institute for
Research and Education, and the study is coordinated by the Alzheimer’s
Therapeutic Research Institute at the University of Southern California.
ADNI data are disseminated by the Laboratory for Neuro Imaging at the
University of Southern California.
Data used in preparation of this article were obtained from the
Alzheimer’s Disease Neuroimaging Initiative (ADNI) database
(adni.loni.usc.edu). As such, the investigators within the ADNI
contributed to the design and implementation of ADNI and/or provided
data but did not participate in analysis or writing of this report. A
complete listing of ADNI investigators can be found at: http://adni.loni.usc.edu/wp-content/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf
Conflict of Interest: None declared.