Abstract
The increase in life expectancy followed by the burden of chronic diseases contributes to disability at older ages. The estimation of how much chronic conditions contribute to disability can be useful to develop public health strategies to reduce the burden. This paper introduces the R package addhaz, which is based on the attribution method (W. J. Nusselder and Looman 2004) to partition disability into the additive contributions of diseases using cross-sectional data. The R package includes tools to fit the additive hazard model, the core of the attribution method, to binary and multinomial outcomes. The models are fitted by maximizing the binomial and multinomial log-likelihood functions using constrained optimization. Wald and bootstrap confidence intervals can be obtained for the parameter estimates. Also, the contribution of diseases to the disability prevalence and their bootstrap confidence intervals can be estimated. An additional feature is the possibility to use parallel computing to obtain the bootstrap confidence intervals. In this manuscript, we illustrate the use of addhaz with several examples for the binomial and multinomial models, using the data from the Brazilian National Health Survey, 2013.The increase in longevity observed worldwide is usually followed by the burden of chronic diseases, which are among the leading causes of disability late in life (Beard et al. 2016). Disability has become a public health priority due to its adverse effects on health outcomes and quality of life, resulting in increased costs of health care (Yang, Ding, and Dong 2014). Therefore, the identification of which chronic diseases are the main contributors to the disability burden plays an important role in the formulation of public health response to population aging (Klijs et al. 2011).
Although prospective studies are better suited to establish the
causal relationship between chronic diseases and disability, they are
costly and usually with limited sample size. Alternatively,
cross-sectional data has been widely used to investigate the association
of chronic diseases and disability. Among the methods based on
cross-sectional data, the attribution method proposed by (W. J. Nusselder and Looman 2004) has the
advantage of partitioning the disability prevalence into the additive
contributions of chronic diseases, taking into account multimorbidity
and that disability can be present even in the absence of chronic
diseases. The method was originally proposed for binary outcomes, but it
was recently extended to multicategory response variables (Renata T. C. Yokota et al. 2017) and it is
based on the binomial and multinomial additive hazard models,
respectively. The use of non-canonical link functions in the models
imposes a constraint on the linear predictor, which limits the use of
standard statistical software to fit the models, such as
glm in R or proc GLM in SAS (SAS Institute Inc. 2008). Despite this
practical difficulty, the attribution method for binary outcomes has
been widely used previously with data from the Netherlands (W. J. Nusselder and Looman 2004; Klijs et al.
2011), Belgium (W. J. Nusselder et al.
2005; Renata T. C. Yokota et al. 2015), Germany (Strobl et al. 2013), China (Chen et al. 2013), and Brazil (R. T. C. Yokota et al. 2016), owing to the
development of the software in R to fit the binomial model and to
estimate the contribution of diseases to the disability prevalence by
(W. Nusselder and Looman 2010) for non-R
users, which is available upon request to the authors (w.nusselder@erasmusmc.nl).
In this manuscript we present the R package addhaz, which is an extension of the R software developed by (W. Nusselder and Looman 2010), offering an open-source implementation of the binomial and multinomial additive hazard models. The R functions can also be used to calculate the contribution of chronic diseases to the disability burden for both models.
This paper is structured as follows. In Section 2, a brief description of the attribution method is presented, followed by the definition of the binomial and multinomial additive hazard models. Section 3 introduces some features and options of addhaz. The existing alternative software to fit the binomial and multinomial models is discussed in Section 4. Examples of how to use the R functions to fit the models and to estimate contributions are shown in Section 5, using the data of the 2013 Brazilian National Health Survey (BNHS). The main advantages and disadvantages of the attribution method and addhaz are discussed in Section 6. Finally, conclusions and recommendations for future research are outlined in Section 7.
Analogous to the mortality analysis, in which a single disease is assigned as underlying cause of death in the death certificate, the attribution method aims to assess the probability that a single reported disease was the cause of disability in a survey, taking into account that individuals can report more than one disease (multimorbidity) and that disability can be present without any reported diseases (W. J. Nusselder and Looman 2004; W. Nusselder and Looman 2010).
In the attribution method, the disability that is not associated with any disease included in the analysis is attributed to “background". The background can represent the effect of age-related losses in functioning; underreporting and underdiagnosed diseases; and other causes of disability that were not included in the survey or analysis. More details about the attribution method can be found elsewhere (W. J. Nusselder and Looman 2004; W. Nusselder and Looman 2010).
The following assumptions are required to fit the binomial and multinomial additive hazard models to cross-sectional data: (i) disability is caused by the diseases that are still present in the time of the survey and the background; (ii) the estimated cross-sectional cumulative rates reflect the transition rates that would have been estimated with longitudinal data (stationarity assumption); (iii) the recovery rate is zero; (iv) the ratio of the cause-specific cumulative rates to the overall cumulative rate is constant over time (proportionality assumption); (v) the start of the time (age) at risk to become disabled is the same for all diseases; (vi) individuals from the same age group are exposed to the same cumulative rate of disability for background; (vii) diseases and background act as independent competing causes of disability (W. J. Nusselder and Looman 2004; W. Nusselder and Looman 2010).
For binary outcomes, the binomial additive hazard model is defined as:
\[\label{eq:1} \begin{array}{ll} y_{i} \sim Bernoulli(\pi_{i}) \\ \pi_{i} = 1-\text{exp}{(-\eta_i)}\\ \eta_{i}= \alpha_{a_i} + \displaystyle \sum_{d=1}^m \beta_d X_{di} \\ \end{array} (\#eq:1)\]
where \(y_i\) is the binary disability outcome; \(\pi_i\) is the disability probability for individual \(i\); \(\eta_i\) is the linear predictor representing the overall cumulative rate (or cumulative hazard) of disability for individual \(i\); \({a_i}\) denotes the age group of individual \(i\) (with \(f\) age groups, \(a_i\) can only get values between \(1, \ldots, f\)); \(\alpha_{a}\) is the cumulative rate of disability for background by age group \(a\) (\(a=1,\ldots,f\)); \(\beta_d\) is the cumulative rate of disability (or disabling impact) for disease \(d(1, \ldots, m)\); and \(X_{di}\) is the indicator variable for disease \(d\) and individual \(i\).
A linear inequality constraint is applied to the linear predictor (\(\eta_{i} \geq 0\)) to ensure that \(\pi_i\) lies between \((0,1)\). The standard errors (\(SE\)) for the regression coefficients are estimated based on the inverse of the observed information matrix. The 95% Wald confidence intervals (Wald CI) can be obtained using the standard errors described above (Wald CI) as showed in @ref(eq:2) or via bootstrapping (Efron and Tibshirani 1994).
\[\label{eq:2} \begin{array}{ll} \text{95\% Wald CI} = \widehat{\alpha}_a \pm 1.96(\widehat{SE})\\ \text{95\% Wald CI} = \widehat{\beta}_d \pm 1.96(\widehat{SE}) \end{array} (\#eq:2)\]
The multinomial additive hazard model is an extension of the binomial model:
\[\label{eq:3} \begin{array}{ll} y_{ij} \sim Multinomial(1,\Pi_{i}) \\ \pi_{ij}= \left[1-\text{exp}{(-\sum \limits_{q=1}^{c} \eta_{iq})}\right]\left(\frac{\eta_{ij}}{\sum \limits_{q=1}^{c}\eta_{iq}} \right) \\ \eta_{ij}= \alpha_{a_ij} + \displaystyle \sum_{d=1}^m \beta_{dj} X_{di} \\ \end{array} (\#eq:3)\]
where \(y_{ij}\) is the multinomial response variable (disability) with one independent observation and vector of disability probabilities \(\Pi_{i} = (\pi_{i0}, \ldots, \pi_{ij}, \ldots, \pi_{ic})\) per individual \(i\); \(\pi_{ij}\) is the probability of disability category \(j\) for individual \(i\); \(\eta_{ij}\) is the linear predictor (overall cumulative rate) for disability category \(j\) and individual \(i\); \({a_i}\) denotes the age group of individual \(i\) (with \(f\) age groups, \(a_i\) can only get values between \(1, \ldots, f\)); \(\alpha_{aj}\) is the cumulative rate of disability category \(j\) for background by age group \(a\) \((a = 1, \ldots, f)\); \(\beta_{dj}\) is the cumulative rate of disability category \(j\) or disabling impact for disease \(d(1, \ldots, m)\); and \(X_{di}\) is the indicator variable for disease \(d\) and individual \(i\).
Besides the inequality constraint in the linear predictor \(\eta_{ij} \geq 0\), an additional constraint is required: \(\sum_{j=1}^c \pi_{ij} < 1\), to ensure that all \(\pi_{ij}\) > 0. Similar to the binomial case, the standard errors are estimated by the inverse of the observed information matrix and the 95% Wald CI and bootstrap percentile confidence intervals (bootstrap CI) can be obtained using addhaz.
The attribution of disability to chronic diseases depends on the disease prevalence (\(X_d\)) and the disabling impacts of the diseases (\(\beta_d\) and \(\beta_{dj}\)) (W. J. Nusselder and Looman 2004; W. Nusselder and Looman 2010). The contribution of chronic diseases and background to the disability prevalence can be calculated in five steps for both binary and multicategory response variables.
For the binary case, the cause-specific disability probabilities for individual \(i\) and each chronic condition (\(\widehat{D}_{di}\)) and the background (\(\widehat{B}_i\)) are estimated based on the proportionality assumption, analogous to the competing risks setting in mortality analysis (Manton and Stallard 1984; Chiang 1991):
\[\label{eq:4} \begin{array}{ll} \widehat{D}_{di} = \widehat{\pi_{i}}\left(\frac{\widehat{\beta}_{d} X_{di}}{\widehat{\eta_i}}\right) \\~\\ \widehat{B}_i = \widehat{\pi_{i}}\left(\frac{\widehat{\alpha}_{ai}}{\widehat{\eta_i}}\right) \\ \end{array} (\#eq:4)\]
Next, the number of disabled individuals by each disease (\(\widehat{N}_d\)) and background (\(\widehat{N}_b\)) are estimated as:
\[\label{eq:5} \begin{array}{ll} \widehat{N}_d = \displaystyle \sum_{i=1}^n \widehat{D}_{di}\\~\\ \widehat{N}_b = \displaystyle \sum_{i=1}^n \widehat{B}_i\\ \end{array} (\#eq:5)\]
The absolute contribution of each disease (\(\widehat{AC}_d\)) and background (\(\widehat{AC}_b\)) to the total disability prevalence is obtained by:
\[\label{eq:6} \begin{array}{ll} \widehat{AC}_d = \frac{\widehat{N}_d}{n}\\~\\ \widehat{AC}_b = \frac{\widehat{N}_b}{n}\\ \end{array} (\#eq:6)\]
The absolute contribution of background and diseases defined above sum to the disability prevalence (\(\widehat{P}\)): \[\label{eq:7} \begin{array}{ll} \widehat{P} = \widehat{AC}_b + \displaystyle \sum_{d=1}^m \widehat{AC}_d\\ \end{array} (\#eq:7)\]
Finally, the relative contribution of diseases (\(\widehat{RC}_d\)) and background (\(\widehat{RC}_b\)) to the disability prevalence is estimated by:
\[\label{eq:8} \begin{array}{ll} \widehat{RC}_d = \frac{\widehat{AC}_d}{\widehat{P}}\\~\\ \widehat{RC}_b = \frac{\widehat{AC}_b}{\widehat{P}}\\ \end{array} (\#eq:8)\]
The relative contributions (\(\widehat{RC}_d\) and \(\widehat{RC}_b\)) sum to 1.
Analogous to the binomial case, the contribution of chronic diseases to the disability prevalence for multinomial outcomes can be obtained in five steps for each category \(j\) of the outcome:
1. Cause-specific disability probabilities for each disease (\(\widehat{D}_{dij}\)) and background (\(\widehat{B}_{ij}\)) for individual \(i\): \[\label{eq:9} \begin{array}{ll} \widehat{D}_{dij} = \widehat{\pi}_{ij}\left(\frac{\widehat{\beta}_{dj} X_{di}}{\widehat{\eta}_{ij}}\right)\\~\\ \widehat{B}_{ij} = \widehat{\pi}_{ij}\left(\frac{\widehat{\alpha}_{a_ij}}{\widehat{\eta}_{ij}}\right)\\ \end{array} (\#eq:9)\]
2. Number of disabled individuals by each disease (\(\widehat{N}_{dj}\)) and background (\(\widehat{N}_{bj}\)): \[\label{eq:10} \begin{array}{ll} \widehat{N}_{dj} = \displaystyle \sum_{i=1}^n \widehat{D}_{dij}\\~\\ \widehat{N}_{bj} = \displaystyle \sum_{i=1}^n \widehat{B}_{ij}\\ \end{array} (\#eq:10)\]
3. Absolute contribution of each disease (\(\widehat{AC}_{dj}\)) and background (\(\widehat{AC}_{bj}\)) to the total disability prevalence: \[\label{eq:11} \begin{array}{ll} \widehat{AC}_{dj} = \frac{\widehat{N}_{dj}}{n}\\~\\ \widehat{AC}_{bj} = \frac{\widehat{N}_{bj}}{n}\\ \end{array} (\#eq:11)\]
4. Total disability prevalence (\(\widehat{P}_j\)): \[\label{eq:12} \begin{array}{ll} \widehat{P}_j = \widehat{AC}_{bj} + \displaystyle \sum_{d=1}^m \widehat{AC}_{dj}\\ \end{array} (\#eq:12)\]
5. Relative contribution of diseases (\(\widehat{RC}_{dj}\)) and background (\(\widehat{RC}_{bj}\)) to the disability prevalence: \[\label{eq:13} \begin{array}{ll} \widehat{RC}_{dj} = \frac{\widehat{AC}_{dj}}{\widehat{P}_j}\\~\\ \widehat{RC}_{bj} = \frac{\widehat{AC}_{bj}}{\widehat{P}_j}\\ \end{array} (\#eq:13)\]
The absolute contributions defined in equations @ref(eq:11) sum to the prevalence of disability for each category \(j\) and the relative contributions defined in equations @ref(eq:13) sum to 1 for each disability category \(j\). The confidence intervals for the absolute and relative contributions for the binary and multinomial cases can be estimated via bootstrapping (Efron and Tibshirani 1994) in addhaz.
In this section, a brief explanation about the constrained optimization, the parallel option to obtain the bootstrap CI for the parameter estimates, and the option to perform the likelihood ratio test for model selection is provided.
The binomial and multinomial additive hazard models are generalized
linear models with non-canonical link functions \(\eta_i = \text{log}
\left(\frac{1}{1-\pi_i}\right)\) for the binomial model and \(\eta_{ij} = [-\text{log}(1-\sum_{q=1}^c
\pi_{iq})]\left(\frac{\pi_{ij}}{\sum_{q=1}^c \pi_{iq}}\right)\)
for the multinomial model. For both models, the model parameters are
estimated using maximum likelihood. The use of non-canonical link
functions requires a constraint in the linear predictors (\(\eta_i \geq 0\) and \(\eta_{ij} \geq 0\)) to ensure that the
disability probabilities (\(\pi_{i}\)
and \(\pi_{ij}\)) are valid, i.e., the
probabilities lie between 0 and 1. In addhaz, this constraint
is implemented in the optimization procedure, with an adaptive barrier
algorithm (Lange 2010), by calling
constrOptim in R.
Besides the option to estimate the confidence intervals for the parameter estimates based on the standard errors obtained by the inverse of the information matrix for the binomial and multinomial models (Wald CI), addhaz also offers the user the option to obtain the bootstrap CI based on empirical percentiles of the replicates (Efron and Tibshirani 1994).
To reduce computation time, parallel computing can be used to obtain
the bootstrap CI. By default R does not use all the cores available on a
computer. The basic idea of parallel computing is to split the work to
more than one core of the computer, to execute it in parallel, and then
to collect the results. Several R packages can be used to implement
parallel computation. In addhaz it is implemented by calling
the functions boot and boot.ci in the boot package
(Canty and Ripley 2016; Davison and Hinkley
1997).
The package also includes a function to perform the likelihood ratio test to compare two binomial or multinomial nested models that can be used for model selection.
The likelihood ratio test is defined as -2*log(likelihood model 1/likelihood model 2).The resulting test statistic is assumed to follow a \(\chi^2\) distribution, with degrees of freedom (df) equal to the difference of the df between the models. If the test is statistically significant, the model with more variables fits the data significantly better than the model with less variables.
The original software for the attribution method (W. J. Nusselder and Looman 2004; W. Nusselder and Looman 2010) was developed in R, but it is not available as an R package, since it focuses on non-R users: an Excel file is used to input the model arguments and this file is called in the R code. This software is restricted to binary outcomes and it is freely available upon request to the authors. Different from addhaz, a penalty term is added to the binomial likelihood function when \(\pi_i \leq 0\), to ensure that valid probabilities are obtained. The original software also allows the user to obtain the bootstrap CI for the model parameters and contributions. Additionally, it offers the options: (i) reduced rank regression (RRR) (T. Yee 2014) to reduce the number of parameters when interactions between diseases and age groups are of interest; and (ii) model selection, using the likelihood ratio test.
Besides the original software, the parameter estimates of the
binomial additive hazard model can be obtained using the R packages stats with
glm and logbin
with the function logbin (Donoghoe
2016). In both packages, the log-binomial model, i.e., \(\pi_i=\text{exp}(\eta_i)\), used to
estimate relative risks (Marschner and Gillett
2012), can be fitted to a transformed version of the response
variable \(y^{*} = 1 - y\), with the
log link function. The estimated model parameters should be multiplied
by \(-1\), since \(1-\pi_i = \text{exp}(-\eta_i)\). However,
care should be taken with glm: by specifying the option
family = binomial(link = log) to fit the log-binomial
model, convergence failure may occur with the iterative weighted least
squares (IWLS) algorithm when the maximum likelihood estimates (MLE) lie
on the boundary of the parameter space. In glm, the IWLS is
modified so that if constraints are violated, step-halving is used
iteratively until they are respected. Although this should not result in
invalid estimates, it may cause difficulty in convergence. An advantage
of logbin is that it includes constrained optimization as an
option to optimize the binomial log-likelihood function
(method = "ab"). This is done by calling
constrOptim in R to constrain the parameter space.
Since addhaz was developed with focus on the attribution method, apart from estimating the model parameters for the binomial additive hazard model, it also gives the user the option to obtain the contribution of diseases to the disability prevalence and to obtain bootstrap CI for the parameter estimates and the contributions, using parallel computing to reduce computation time.
To our knowledge, there is no other package available to fit the
multinomial additive hazard model. Although it is possible to fit the
log-multinomial model (Blizzard and Hosmer
2007), i.e., \(\pi_{ij}=\text{exp}(\eta_{ij})\), with the
function vglm in VGAM (T. W. Yee 2016), different from the binomial
case, no simple transformation of the outcome can be applied to obtain
the parameter estimates of the multinomial additive hazard model using
the log-multinomial model.
In this section, the use of the functions BinAddHaz,
MultAddHaz, and LRTest in addhaz are
illustrated using a subset of the 2013 BNHS available in the package. A
selected output of the results is shown.
The Brazilian National Health Survey (BNHS) (“Pesquisa Nacional de Saúde") was a nationally representative survey of the Brazilian adult population (\(\geq 18\) years) with approximately 60,000 individuals, conducted in 2013. A multistage sampling design with simple random sampling (census tracts) and clustering (households and adults) was used. The response rate was 77%. Survey weights were included to take into account the complex design of the sample. Detailed information about the BNHS can be found in previous publications (Szwarcwald et al. 2014; R. T. C. Yokota et al. 2016).
In addhaz, a subset of the BNHS data is available, with women aged \(\geq\) 60 years (n = 6,294) and the following variables: disability as binary and multinomial outcomes, survey weight, age, diabetes, arthritis, and stroke (Table 1).
| Variable name | Definition | Type | Categories |
|---|---|---|---|
| wgt | survey weight | continuous | - |
| age | age group | binary | 0: 60-79 years |
| 1: \(\geq\)80 years | |||
| diab | diabetes | binary | 0: no |
| 1: yes | |||
| arth | arthritis | binary | 0: no |
| 1: yes | |||
| stro | stroke | binary | 0: no |
| 1: yes | |||
| dis.bin | binary disability outcome | binary | 0: no disability |
| 1: disabled | |||
| dis.mult | multinomial disability outcome | categorical | 0: no disability |
| 1: mild disability | |||
| 2: severe disability |
The binomial and multinomial disability outcomes were defined based on five instrumental activities of daily living (IADL): shopping, handling finances, taking own medications, going to the doctor, and using transportation. Participants were asked about the degree of difficulty in performing IADL tasks, with possible answers: “1. Unable",”2. A lot of difficulty", “3. Some difficulty", or”4. No difficulty". The definition of the binary and multinomial outcomes is shown in Table 2. The reference category “No disability" represents answer”4" to all IADL questions.
| Outcome | Outcome category | Answer to at least one IADL question |
|---|---|---|
| Binary | Disabled | 1, 2 or 3 |
| Multinomial | Mild disability | 3 |
| Severe disability | 1 or 2 |
A summary of the characteristics of the study population is shown in Table 3. A higher proportion of older women (\(\geq\)80 years), diabetes, arthritis, stroke, and the disease pairs was observed in women with mild and severe disability compared to women without disability.
| Characteristic | Total | No disability | Mild disability | Severe disability | ||||
|---|---|---|---|---|---|---|---|---|
| 2-9 | N | % | N | % | N | % | N | % |
| Age (years) | ||||||||
| 60-79 | 5379 | 85.5 | 3946 | 93.5 | 642 | 78.3 | 791 | 63.0 |
| \(\geq\)80 | 915 | 14.5 | 273 | 6.5 | 178 | 21.7 | 464 | 37.0 |
| Diseases | ||||||||
| Diabetes | 1243 | 19.7 | 697 | 16.5 | 190 | 23.2 | 356 | 28.4 |
| Arthritis | 1428 | 22.7 | 819 | 19.4 | 211 | 25.7 | 398 | 31.7 |
| Stroke | 286 | 4.5 | 100 | 2.4 | 41 | 5.0 | 145 | 11.6 |
| Diabetes and arthritis | 298 | 4.7 | 135 | 3.2 | 53 | 6.5 | 110 | 8.8 |
| Diabetes and stroke | 91 | 1.4 | 31 | 0.7 | 13 | 1.6 | 47 | 3.7 |
| Arthritis and stroke | 79 | 1.3 | 28 | 0.7 | 7 | 0.9 | 44 | 3.5 |
The function BinAddHaz fits the binomial additive hazard
model and estimates the contribution of diseases to the disability
burden for binary outcomes in addhaz. To illustrate the use of
BinAddHaz, five models were fitted with the binary
disability outcome: model 1 - with three diseases (no background and
diseases by age); model 2 - with only background by age, with bootstrap
CI; model 3 - with only diseases by age; model 4 - with background and
diseases by age, with bootstrap CI; model 5 - with two-way interaction
between diseases. To illustrate how the LRTest function can
be used for model selection, models 2 and 4 are compared.
Before fitting model 1, addhaz and the data can be loaded and the names of the variables can be checked using:
library("addhaz")
data("disabData")
names(disabData)
[1] "dis.bin" "dis.mult" "wgt" "age" "diab" "arth" "stro"
The first binomial model can be fitted with:
model1 <- BinAddHaz(dis.bin ~ diab + arth + stro , data = disabData, weights = wgt,
attrib = TRUE, type.attrib = "both", collapse.background = FALSE,
attrib.disease = FALSE)
Since no attribution variable such as age was included in the model,
the arguments collapse.background and
attrib.disease were set to FALSE. The results
of the model were stored as an object called model1, which
can be checked with the summary function:
summary(model1)
$`call`
BinAddHaz(formula = dis.bin ~ diab + arth + stro, data = disabData,
weights = wgt, attrib = TRUE, collapse.background = FALSE,
attrib.disease = FALSE, type.attrib = "both")
$bootstrap
[1] FALSE
$coefficients
Estimate StdErr t.value p.value
(Intercept) 0.2970833 0.009426403 31.516082 1.498823e-202
diab 0.1331831 0.023821666 5.590838 2.354697e-08
arth 0.1306445 0.022203662 5.883917 4.212860e-09
stro 0.5927519 0.075697633 7.830521 5.663378e-15
attr(,"class")
[1] "summary.binaddhazmod"
The first element of the output call is the formula used
to fit the model. The bootstrap, indicates if the bootstrap
CI was requested. Since it was not requested, its value is
FALSE.
Next, the coefficients are printed, with their estimates, standard errors (calculated based on the inverse of the observed information matrix), the \(t\) value (value of the \(t\) statistic), and the \(p\) value. The intercept represents the background. According to this output, all diseases were significant in the model. To identify the most disabling diseases, i.e., the diseases with highest cumulative rate of disability, the coefficients can be sorted in decreasing order using:
sort(model1$coefficients, decreasing = TRUE)
stro (Intercept) diab arth
0.5927519 0.2970833 0.1331831 0.1306445
Stroke was the most disabling disease, while arthritis was the least disabling disease. The 95% Wald CI can be obtained by:
model1$ci
CI2.5 CI97.5
(Intercept) 0.27860754 0.3155590
diab 0.08649261 0.1798735
arth 0.08712532 0.1741637
stro 0.44438455 0.7411193
Both the relative and absolute contributions were requested
(attrib = TRUE, type.attrib = "both") and can be assessed
with:
model1$contribution
$`att.rel`
att.rel
(Intercept) 0.80405374
diab 0.06938567
arth 0.07451155
stro 0.05204903
$att.abs
att.abs
disab 0.30853338
(Intercept) 0.24807742
diab 0.02140780
arth 0.02298930
stro 0.01605886
In the above output, the relative contribution (att.rel:
the contributions sum to 1) is shown at the top and the absolute
contribution (att.abs: the contributions sum to the
disability prevalence) is presented at the bottom. No confidence
intervals are provided for the contributions, as they can only be
calculated via bootstrapping and this option was not requested.
In the output for the absolute contribution, the disability
prevalence (disab) was 30.9%. The absolute contribution can
be sorted in decreasing order using:
model1$contribution$att.abs[order(model1$contribution$att.abs[, 1], decreasing = TRUE), ]
disab (Intercept) arth diab stro
0.30853338 0.24807742 0.02298930 0.02140780 0.01605886
The background (Intercept) was the main contributor to
the disability burden in this population. In this case, it can represent
other causes not included in the model such as dementia or back pain,
which are important causes of disability in the older population, but
were not included in the analysis. Among the three diseases, arthritis
was the main contributor to the disabilitiy burden in older Brazilian
women.
It is interesting to note that, although stroke was the most disabling disease, it showed the lowest contribution to the total disability prevalence. This low contribution can be a consequence of the low occurrence of stroke in this population - 4.5% (Table 3) - as the contribution of chronic conditions to the disability prevalence depends on both, the disease occurrence and the disabling impact (W. Nusselder and Looman 2010).
In model 2, the background is modelled by age group, but the diseases are not. The model can be fitted by:
model2 <- BinAddHaz(dis.bin ~ factor(age) -1 + diab + arth + stro , data = disabData,
weights = wgt, attrib = TRUE, type.attrib = "both",
collapse.background = FALSE, attrib.disease = FALSE, seed = 111,
bootstrap = TRUE, conf.level = 0.95, nbootstrap = 1000,
parallel = TRUE, type.parallel = "snow", ncpus = 4)
Since the background is modelled by age group,
factor(age) is included in the model. The -1
is included in the model.formula to obtain the coefficients
for both age groups, including the reference category. Since the
background is modelled by age, it should not be collapsed by age
(collapse.background = FALSE). As no interaction between
diseases and age were included in the model, the argument
attrib.disease is FALSE. The option
seed = 111 allows the user to specify a random number to
make the results of the bootstrapping reproducible. In the example
above, the random number used was 111. The bootstrap CI for the
regression coefficients and the contributions was requested
(bootstrap = TRUE), with confidence level = 0.95
(conf.level = 0.95). The bootstrap CI was based on 1000
replicates (nbootstrap = 1000) and it was obatined with
parallel computing (parallel = TRUE) on Windows
(type.parallel = "snow") with 4 CPUS
(ncpus = 4).
The summary of the model can be assessed with:
summary(model2)
$`call`
BinAddHaz(formula = dis.bin ~ factor(age) - 1 + diab + arth + stro, data = disabData,
weights = wgt, attrib = TRUE, type.attrib = "both",
collapse.background = FALSE, attrib.disease = FALSE, seed = 111,
bootstrap = TRUE, conf.level = 0.95, nbootstrap = 1000,
parallel = TRUE, type.parallel = "snow", ncpus = 4)
$bootstrap
[1] TRUE
$coefficients
Estimate CILow CIHigh
factor(age)0 0.22345184 0.19892365 0.2529054
factor(age)1 1.10472873 0.95279272 1.2705886
diab 0.12935797 0.06191508 0.2044457
arth 0.08531865 0.02375110 0.1513319
stro 0.52453664 0.28688675 0.8509963
$conf.level
[1] 0.95
attr(,"class")
[1] "summary.binaddhazmod"
Since the bootstrap CI was requested, bootstrap is
TRUE. The coefficients show that age and all diseases were
significant (the null value, i.e. 0, is not included in the bootstrap
CI). The factor(age)0 and factor(age)1
represents the background cumulative rates for age groups 0 and 1,
respectively. The contributions can be checked with:
model2$contribution
$att.rel
Contribution CILow CIHigh
factor(age)0 0.53992912 0.51937657 0.56054196
factor(age)1 0.29842186 0.27575265 0.32163433
diab 0.06702577 0.06162503 0.07256112
arth 0.04869951 0.04513817 0.05269298
stro 0.04592374 0.03666781 0.05519789
$att.abs
Contribution CILow CIHigh
disab 0.30902546 0.30227641 0.31623119
factor(age)0 0.16685185 0.16405831 0.16947954
factor(age)1 0.09221995 0.08372162 0.10131428
diab 0.02071267 0.01906935 0.02254047
arth 0.01504939 0.01396744 0.01628476
stro 0.01419161 0.01127267 0.01727091
The contributions and the 95% bootstrap CI are shown. The background is the main contributor to the disability prevalence. Note that by allowing the background to vary by age does not change the disability prevalence (30.9%), as compared to model 1.
In Model 3, we allow only the diseases to vary by age group by including interactions between age and diseases in the model. Before fitting model 3, a matrix with the diseases to be included in the model can be defined by:
disease <- as.matrix(disabData[, c("diab", "arth", "stro")])
The first six elements of the matrix created can be checked using:
head(disease)
diab arth stro
36 1 0 0
98 0 0 0
113 0 1 1
347 1 0 0
352 1 0 0
436 0 0 0
The binomial additive hazard model can be fitted with the function:
model3 <- BinAddHaz(dis.bin ~ disease:factor(age), data = disabData, weights = wgt,
attrib = TRUE, attrib.var = age, type.attrib = "abs",
collapse.background = FALSE, attrib.disease = TRUE)
summary(model3)
$`call`
BinAddHaz(formula = dis.bin ~ disease:factor(age), data = disabData, weights = wgt,
attrib = TRUE, attrib.var = age, collapse.background = FALSE,
attrib.disease = TRUE, type.attrib = "abs")
$bootstrap
[1] FALSE
$coefficients
Estimate StdErr t.value p.value
(Intercept) 0.29425017 0.009339432 31.5062168 1.991333e-202
diseasediab:factor(age)0 0.07487954 0.022708458 3.2974294 9.811601e-04
diseasearth:factor(age)0 0.01218173 0.020156904 0.6043454 5.456358e-01
diseasestro:factor(age)0 0.44896271 0.072553106 6.1880563 6.474653e-10
diseasediab:factor(age)1 0.83733434 0.128901534 6.4959223 8.884711e-11
diseasearth:factor(age)1 1.32865873 0.143790133 9.2402636 3.299325e-20
diseasestro:factor(age)1 1.60530144 0.373531351 4.2976351 1.752351e-05
attr(,"class")
[1] "summary.binaddhazmod"
The (Intercept) represents the background, which was not
modelled by age. The diseases with factor(age)0 and
factor(age)1 represent the regression coefficients for age
0 (60-79 years) and age 1 (\(\geq\)80 years). The output above
shows that arthritis was not significant for the reference age category
(60-79years) (diseasearth:factor(age)0). Only the absolute
contribution was requested (type.attrib = "abs") and it can
be assessed with:
model3$contribution
att.abs
disab.0 0.277835632
backgrnd.0 0.251005540
diseasediab:factor(age)0.0 0.012575547
diseasearth:factor(age)0.0 0.002220039
diseasestro:factor(age)0.0 0.012034506
disab.1 0.493678649
backgrnd.1 0.206353130
diseasediab:factor(age)1.1 0.089832332
diseasearth:factor(age)1.1 0.158378063
diseasestro:factor(age)1.1 0.039115125
The attribution is presented by level of the attribution variable
(attrib.var), which in this example is age. The first five
rows show the results for attribution variable level 0, which in this
case represents age group 60-79 years and the last five rows represent
the results for attribution variable level 1 (\(\geq\)80 years). The results
indicate that the disability prevalence in the older women (49.4%) was
much larger than in the younger age group (27.8%). While the three
diseases included in the model showed a low contribution to the
disability prevalence in women aged 60-79 years (<1.5%), arthritis
was by far the most important disease contributing to the disability
prevalence in the oldest women (15.9%).
The same matrix of diseases defined for model 3 will be used in model 4. This model can be fitted with the function:
model4 <- BinAddHaz(dis.bin ~ factor(age) - 1 + disease:factor(age), data = disabData,
weights = wgt, attrib = TRUE, attrib.var = age, type.attrib = "abs",
collapse.background = FALSE, attrib.disease = TRUE, seed = 111,
bootstrap = TRUE, conf.level = 0.95, nbootstrap = 1000,
parallel = TRUE, type.parallel = "snow", ncpus = 4)
The -1 in the model.formula is used to
obtain a different parameterization than the default: here we obtain the
parameter estimates for all the age groups, including the reference
category. Since we want to estimate the contributions for background by
age, the argument collapse.background is set to
FALSE. If this argument would be set to TRUE,
only one background, common to all age groups, would be estimated. Since
the contributions of diseases should be estimated by age group, the
argument attrib.disease was set to TRUE. The
parallel option for the bootstrap CI was used
(parallel = TRUE) on a Windows computer
(type.parallel = "snow") with 4 cores
(ncpus = 4). The option seed = 111 allows the
user to specify a random number (in this case 111) to make the results
of the bootstrapping reproducible. The summary of the model can be
checked with:
summary(model4)
$call
BinAddHaz(formula = dis.bin ~ factor(age) - 1 + disease:factor(age), data = disabData,
weights = wgt, attrib = TRUE, attrib.var = age, collapse.background = FALSE,
attrib.disease = TRUE, type.attrib = "abs", seed = 111, bootstrap = TRUE,
conf.level = 0.95, nbootstrap = 1000, parallel = TRUE,
type.parallel = "snow", ncpus = 4)
$bootstrap
[1] TRUE
$coefficients
Estimate CILow CIHigh
factor(age)0 0.22661055 0.201218693 0.2568179
factor(age)1 0.94910725 0.784733478 1.1292937
factor(age)0:diseasediab 0.12749849 0.056516120 0.2036685
factor(age)1:diseasediab 0.24124648 -0.181208625 0.7471999
factor(age)0:diseasearth 0.06884402 0.009035558 0.1345238
factor(age)1:diseasearth 0.75879140 0.349882903 1.2234047
factor(age)0:diseasestro 0.48788899 0.255772550 0.8331633
factor(age)1:diseasestro 1.13333515 0.426477637 2.2240599
$conf.level
[1] 0.95
attr(,"class")
[1] "summary.binaddhazmod"
The output with the results of the model is shown for
factor(age)0, which represents the age group of 60-79
years, and factor(age)1, representing the age group of
\(\geq\)80 years. Diabetes was
not significant for age group 1, as the bootstrap CI includes the null
value, i.e. 0. To identify the most disabling diseases, two objects
(coef.age0 and coef.age1) with the
coefficients for each age group can be created and sorted in decreasing
order using:
coef.age0 <- model4$coefficients[seq(1, length(model4$coefficients), 2)]
coef.age1 <- model4$coefficients[seq(0, length(model4$coefficients), 2)]
sort(coef.age0, decreasing = TRUE)
factor(age)0:diseasestro factor(age)0 factor(age)0:diseasediab
0.48788899 0.22661055 0.12749849
factor(age)0:diseasearth
0.06884402
sort(coef.age1, decreasing = TRUE)
factor(age)1:diseasestro factor(age)1 factor(age)1:diseasearth
1.1333352 0.9491072 0.7587914
factor(age)1:diseasediab
0.2412465
Stroke was the most disabling disease in both age groups. Arthritis and diabetes showed the lowest disabling impact in women aged 60-79 years and \(\geq\)80 years, respectively.
Since only the absolute contribution was requested
(type.attrib = "abs"), the results for the absolute
contribution can be assessed with:
model4$contribution
att.abs CILow CIHigh
disab.0 0.24442949 0.24102940 0.24813627
backgrnd.0 0.19743946 0.19694283 0.19790210
factor(age)0:diseasediab.0 0.02141352 0.01956557 0.02342391
factor(age)0:diseasearth.0 0.01253887 0.01153068 0.01363192
factor(age)0:diseasestro.0 0.01303764 0.01003208 0.01636211
disab.1 0.69484947 0.68537515 0.70574268
backgrnd.1 0.54868976 0.53993448 0.55643174
factor(age)1:diseasediab.1 0.02637877 0.02080951 0.03249654
factor(age)1:diseasearth.1 0.09137348 0.07746954 0.10759760
factor(age)1:diseasestro.1 0.02840746 0.01932950 0.03894183
The CILow and CIHigh refers to the 2.5th and 97.5th percentiles of
the 1,000 bootstrap replicates, since the bootstrap CI was requested
(bootstrap = TRUE) with conf.level = 0.95. To
identify the main contributors to the disability burden, two objects
(one for each age group) can be defined with the absolute contribution
and bootstrap CI using:
cont.age0 <- model4$contribution[c(1:5), ]
cont.age1 <- model4$contribution[c(6:10), ]
cont.age0[order(cont.age0[, 1], decreasing = TRUE), ]
att.abs CILow CIHigh
disab.0 0.24442949 0.24102940 0.24813627
backgrnd.0 0.19743946 0.19694283 0.19790210
factor(age)0:diseasediab.0 0.02141352 0.01956557 0.02342391
factor(age)0:diseasestro.0 0.01303764 0.01003208 0.01636211
factor(age)0:diseasearth.0 0.01253887 0.01153068 0.01363192
cont.age1[order(cont.age1[, 1], decreasing = TRUE), ]
att.abs CILow CIHigh
disab.1 0.69484947 0.68537515 0.70574268
backgrnd.1 0.54868976 0.53993448 0.55643174
factor(age)1:diseasearth.1 0.09137348 0.07746954 0.10759760
factor(age)1:diseasestro.1 0.02840746 0.01932950 0.03894183
factor(age)1:diseasediab.1 0.02637877 0.02080951 0.03249654
According to the results, the disability prevalence in the oldest women (69.5%) was 2.8 times larger than in women aged 60-79 years (24.4%). The background was the main contributor to the disability prevalence in both age groups. Among the chronic conditions, diabetes was the main contributor to the disability prevalence in women aged 60-79 years (2.1%) while arthritis contributed most to the disability burden in older women (9.1%).
In model 5, the independence assumption (assumption vii) is violated and two-way interactions between diseases are included in the model. In total, 6 parameters and the intercept will be estimated in model 5. The model can be fitted by:
model5 <- BinAddHaz(dis.bin ~ (diab + arth + stro)^2, data = disabData, weights = wgt,
attrib = TRUE, collapse.background = FALSE, attrib.disease = FALSE,
type.attrib = "both")
summary(model5)
$`call`
BinAddHaz(formula = dis.bin ~ (diab + arth + stro)^2, data = disabData, weights = wgt,
attrib = TRUE, collapse.background = FALSE, attrib.disease = FALSE,
type.attrib = "both")
$bootstrap
[1] FALSE
$coefficients
Estimate StdErr t.value p.value
(Intercept) 0.2988163 0.009586541 31.1703950 1.803828e-198
diab 0.1178815 0.026104065 4.5158287 6.422107e-06
arth 0.1101293 0.023712731 4.6443118 3.481543e-06
stro 0.8145107 0.116867992 6.9694938 3.505379e-12
diab:arth 0.1563155 0.067097034 2.3296932 1.985387e-02
diab:stro -0.5072111 0.154344770 -3.2862213 1.020980e-03
arth:stro -0.1494752 0.176853232 -0.8451937 3.980349e-01
attr(,"class")
[1] "summary.binaddhazmod"
The main effects of all the diseases were significant, but only the interaction between diabetes and arthritis and between diabetes and stroke were significant. The negative disabling impact of the interaction term between diabetes and stroke should be carefully interpreted, as it is based on a small sample size (\(n = 91\)) (Table 3).
To illustrate the use of the function LRTest to perform
the likelihood ratio test (LRT) for model selection, models 2
(model2) and 4 (model4) are compared. The LRT
can be performed with:
LRTest(model4, model2)
Likelihood ratio test
Model 1:
dis.bin ~ factor(age) - 1 + disease:factor(age)
Model 2:
dis.bin ~ factor(age) - 1 + diab + arth + stro
Res.df Res.Dev df Deviance Pr(>Chi)
1 6286 6916.818
2 6289 6946.224 3 29.405 1.8407e-06
The output shows the models that are being compared:
Model 1 is the model with the interactions with diseases
and age (previous model 4) and Model 2 is the model without
the interactions between diseases and age (previous model 2). The
degrees of freedom for each model (Res.df), the residual
deviance, i.e. the 2*log-likelihood of each model
(Res.Dev), the difference in the degrees of freedom between
the models (df), the difference between the
2*log-likelihood of the models, i.e. the value of the likelihood ratio
test statistic (Deviance), and the p-value of the test
statistic, based on the \(\chi^2\)
distribution (Pr(>Chi)) are presented. Since the test
was statistically significant at 0.05 significance level, model 4, which
includes interaction between diseases and age, fits the data better than
model 2.
To fit the multinomial additive hazard model and to estimate the
contribution of chronic conditions to the disability burden for
multinomial outcomes, the function MultAddHaz can be used.
As an illustration, two models were fitted: model 6 - with only
background by age; and model 7 - with background and diseases by age,
with bootstrap CI.
Model 6 can be fitted with the function:
model6 <- MultAddHaz(dis.mult ~ factor(age) - 1 + diab + arth + stro, data = disabData,
weights = wgt, attrib = TRUE, seed = 111, collapse.background = FALSE,
attrib.disease = FALSE, type.attrib = "both")
The results of the model can be visualized using:
summary(model6)
$`call`
MultAddHaz(formula = dis.mult ~ factor(age) - 1 + diab + arth +
stro, data = disabData, weights = wgt, attrib = TRUE, seed = 111,
collapse.background = FALSE, attrib.disease = FALSE, type.attrib = "both")
$bootstrap
[1] FALSE
$coefficients
Estimate StdErr t.value p.value
factor(age)0.y1 0.117959944 0.006083174 19.3911842 2.109290e-81
factor(age)1.y1 0.285541582 0.022330639 12.7869869 5.585601e-37
diab.y1 0.002717701 0.011329960 0.2398685 8.104400e-01
arth.y1 0.015602747 0.011534798 1.3526675 1.762105e-01
stro.y1 0.024923984 0.025498946 0.9774515 3.283833e-01
factor(age)0.y2 0.107643802 0.005969496 18.0323096 6.503330e-71
factor(age)1.y2 0.817043178 0.043161129 18.9300697 9.103853e-78
diab.y2 0.124326766 0.017245323 7.2093036 6.283854e-13
arth.y2 0.063767963 0.014095689 4.5239336 6.181719e-06
stro.y2 0.499262854 0.064799754 7.7047030 1.514581e-14
attr(,"class")
[1] "summary.multaddhazmod"
In the above output, the results identified with \(y1\) refer to the outcome (dis.mult) = 1
(mild disability) and the results with \(y2\) refer to the outcome (dis.mult) = 2
(severe disability). For mild disability only the background
(factor(age)0.y1) and (factor(age)1.y1) was
significant while all the diseases and the background were signifcant
for women with severe disability. Similar to the binomial model, the
most disabling diseases can be identified by:
coef.mild <- model6$coefficients[1:5, ]
coef.sev <- model6$coefficients[6:10, ]
sort(coef.mild, decreasing = TRUE)
factor(age)1.y1 factor(age)0.y1 stro.y1 arth.y1 diab.y1
0.285541582 0.117959944 0.024923984 0.015602747 0.002717701
sort(coef.sev, decreasing = TRUE)
factor(age)1.y2 stro.y2 diab.y2 factor(age)0.y2 arth.y2
0.81704318 0.49926285 0.12432677 0.10764380 0.06376796
Background and stroke showed the highest disabling impact for mild and severe disability. The relative and absolute contributions can be checked with:
model6$contribution
$`att.rel`
att.rel
factor(age)0.y1 0.804099194
factor(age)1.y1 0.164283693
diab.y1 0.003665546
arth.y1 0.023317949
stro.y1 0.004633619
factor(age)0.y2 0.426874738
factor(age)1.y2 0.336655536
diab.y2 0.105566151
arth.y2 0.058984332
stro.y2 0.071919244
$att.abs
att.abs
disab.y1 0.1265087428
factor(age)0.y1 0.1017255781
factor(age)1.y1 0.0207833234
diab.y1 0.0004637236
arth.y1 0.0029499244
stro.y1 0.0005861933
disab.y2 0.1901766667
factor(age)0.y2 0.0811816147
factor(age)1.y2 0.0640240276
diab.y2 0.0200762187
arth.y2 0.0112174436
stro.y2 0.0136773621
It is interesting to note that the severe disability prevalence (19.0%) was 1.5 times higher than the mild disability prevalence (12.7%). The results for the relative contribution can be sorted in decreasing order by:
rel.cont.mild <- model6$contribution$att.rel[1:5, ]
rel.cont.sev <- model6$contribution$att.rel[6:10, ]
sort(rel.cont.mild, decreasing = TRUE)
factor(age)0.y1 factor(age)1.y1 arth.y1 stro.y1 diab.y1
0.804099194 0.164283693 0.023317949 0.004633619 0.003665546
sort(rel.cont.sev, decreasing = TRUE)
factor(age)0.y2 factor(age)1.y2 diab.y2 stro.y2 arth.y2
0.42687474 0.33665554 0.10556615 0.07191924 0.05898433
The background was the main contributor to the disability burden, representing 96.8% (0.80 + 0.16) and 76.4% (0.43 + 0.34) of the mild and severe disability prevalence, respectively. Arthritis (2.3%) was the main contributor to the mild disability prevalence while diabetes (10.6%) contributed most to the severe disability prevalence.
The matrix with the diseases (disease) defined for model
3 is used to fit model 7:
model7 <- MultAddHaz(dis.mult ~ factor(age) -1 + disease:factor(age),
data = disabData, weights = wgt, attrib = TRUE, attrib.var = age,
seed = 111, collapse.background = FALSE, attrib.disease = TRUE,
type.attrib = "both", bootstrap = TRUE, conf.level = 0.95,
nbootstrap = 1000, parallel = TRUE, type.parallel = "snow",
ncpus = 4)
The -1 was added to the model.formula to
obtain the parameter estimates for the background for all age groups,
including the reference category. Since the background should be
estimated by age, collapse.background is set to
FALSE. Additionally, attrib.disease is set to
TRUE, as interactions between age and diseases were
included in the model and the contribution of diseases should be
estimated by age. The seed argument in
MultAddHaz is used to obtain reproducible results for the
starting values used in the constrained optimization, which are randomly
generated, and for the bootstrap CI. Besides the summary
function, the disabling impacts and the bootstrap CI can also be
assessed with:
cbind(model7$coefficients, model7$ci)
Coefficients CILow CIHigh
factor(age)0.y1 0.117255379 0.10064630 0.13689281
factor(age)1.y1 0.277818866 0.20558349 0.37304023
factor(age)0:diseasediab.y1 0.005926107 -0.03440926 0.04770883
factor(age)1:diseasediab.y1 -0.027900849 -0.16964726 0.15898841
factor(age)0:diseasearth.y1 0.012814735 -0.02467113 0.05156243
factor(age)1:diseasearth.y1 0.110733103 -0.08447433 0.30975351
factor(age)0:diseasestro.y1 0.032163873 -0.03657685 0.14572706
factor(age)1:diseasestro.y1 -0.028504716 -0.22807053 0.22952796
factor(age)0.y2 0.109165655 0.09146724 0.13167070
factor(age)1.y2 0.672941316 0.53792263 0.83185332
factor(age)0:diseasediab.y2 0.121508443 0.06618176 0.17864913
factor(age)1:diseasediab.y2 0.282986455 -0.09742139 0.80712949
factor(age)0:diseasearth.y2 0.054335292 0.01095538 0.09957643
factor(age)1:diseasearth.y2 0.635463641 0.31671319 1.05424237
factor(age)0:diseasestro.y2 0.456594023 0.24959719 0.72863913
factor(age)1:diseasestro.y2 1.233578243 0.49988818 2.27216717
In the output, factor(age)0 and
factor(age)1 refers to the age groups 60-79 years and \(\geq\) 80 years, respectively.
y1 refers to disability category 1, which here represents
mild disability and y2 refers to disability category 2,
representing severe disability.
Two coefficients (for diabetes and stroke in women aged \(\geq\) 80 years with mild disability) were negative. This suggests a “protective" effect of these conditions. However, these results should be carefully interpreted as they were not statistically significant.
To identify the most disabling diseases for mild and severe disability by age group, the following code can be used:
mild.age0 <- model7$coefficients[seq(1, length(model7$coefficients), 2), ][1:4]
sev.age0 <- model7$coefficients[seq(1, length(model7$coefficients), 2), ][5:8]
mild.age1 <- model7$coefficients[seq(0, length(model7$coefficients), 2), ][1:4]
sev.age1 <- model7$coefficients[seq(0, length(model7$coefficients), 2), ][5:8]
mild.age0[order(mild.age0, decreasing = TRUE)]
factor(age)0.y1 factor(age)0:diseasestro.y1
0.117255379 0.032163873
factor(age)0:diseasearth.y1 factor(age)0:diseasediab.y1
0.012814735 0.005926107
sev.age0[order(sev.age0, decreasing = TRUE)]
factor(age)0:diseasestro.y2 factor(age)0:diseasediab.y2
0.45659402 0.12150844
factor(age)0.y2 factor(age)0:diseasearth.y2
0.10916565 0.05433529
mild.age1[order(mild.age1, decreasing = TRUE)]
factor(age)1.y1 factor(age)1:diseasearth.y1
0.27781887 0.11073310
factor(age)1:diseasediab.y1 factor(age)1:diseasestro.y1
-0.02790085 -0.02850472
sev.age1[order(sev.age1, decreasing = TRUE)]
factor(age)1:diseasestro.y2 factor(age)1.y2
1.2335782 0.6729413
factor(age)1:diseasearth.y2 factor(age)1:diseasediab.y2
0.6354636 0.2829865
Stroke was the most disabling disease in women with severe disability in both age groups and in women aged 60-79 years with mild disability while arthritis was the most disabling disease in women aged \(\geq\) 80 years with mild disability.
The main contributors to the disability burden, based on the absolute contribution can be assessed with:
cont.mild.age0 <- model7$contribution$att.abs[1:5, ]
cont.sev.age0 <- model7$contribution$att.abs[6:10, ]
cont.mild.age1 <- model7$contribution$att.abs[11:15, ]
cont.sev.age1 <- model7$contribution$att.abs[16:20, ]
cont.mild.age0[order(cont.mild.age0[, 1], decreasing = TRUE), ]
att.abs CILow CIHigh
disab.0.y1 0.1146399721 0.1142911352 0.115027470
backgrnd.0.y1 0.1103973843 0.1103736506 0.110418787
factor(age)0:diseasearth.0.y1 0.0024598331 0.0022352202 0.002706063
factor(age)0:diseasediab.0.y1 0.0010238914 0.0009230407 0.001137036
factor(age)0:diseasestro.0.y1 0.0007588633 0.0005256459 0.001035586
cont.sev.age0[order(cont.sev.age0[, 1], decreasing = TRUE), ]
att.abs CILow CIHigh
disab.0.y2 0.137612800 0.133986991 0.14160126
backgrnd.0.y2 0.095139459 0.094884399 0.09537052
factor(age)0:diseasediab.0.y2 0.020450103 0.018538859 0.02259582
factor(age)0:diseasestro.0.y2 0.012179369 0.009234356 0.01571648
factor(age)0:diseasearth.0.y2 0.009843869 0.008984070 0.01077563
cont.mild.age1[order(cont.mild.age1[, 1], decreasing = TRUE), ]
att.abs CILow CIHigh
disab.1.y1 0.2532173709 0.248985442 0.257917633
backgrnd.1.y1 0.2408954310 0.240163632 0.241550122
factor(age)1:diseasearth.1.y1 0.0170621273 0.012596001 0.022104725
factor(age)1:diseasestro.1.y1 -0.0006391061 -0.001067613 -0.000299713
factor(age)1:diseasediab.1.y1 -0.0041010813 -0.005510841 -0.002757878
cont.sev.age1[order(cont.sev.age1[, 1], decreasing = TRUE), ]
att.abs CILow CIHigh
disab.1.y2 0.52797382 0.51474229 0.54101616
backgrnd.1.y2 0.38747404 0.38073529 0.39414511
factor(age)1:diseasearth.1.y2 0.07581147 0.06237721 0.09017923
factor(age)1:diseasestro.1.y2 0.03293044 0.02088364 0.04734430
factor(age)1:diseasediab.1.y2 0.03175788 0.02394332 0.04002170
The severe disability prevalence (60-79 years = 13.8%; \(\geq\)80 years = 52.8%) was larger than the mild disability prevalence (60-79 years = 11.5%; \(\geq\)80 years = 25.3%) in both age groups. Arthritis was the main contributor to the mild disability prevalence in both age groups and to the severe disability prevalence in women aged \(\geq\)80 years, while diabetes was the main contributor to the severe disability prevalence in women aged 60-79 years.
In this paper we introduced the R package addhaz developed to fit the binomial and multinomial additive hazard models to estimate the contribution of diseases to the disability prevalence using cross-sectional data.
The R package addhaz was developed based on the R functions
developed by Nusselder and Looman (W. Nusselder
and Looman 2010) for binomial disability outcomes and for non-R
users. The main advantages of addhaz compared to the original R
functions are: (i) option to use the attribution method for multinomial
responses using the function MultAddHaz; and (ii) option to
do parallel computing for the calculation of the bootstrap percentile
confidence intervals. However, the possibility to use reduced rank
regression (T. Yee 2014) to estimate the
cause-specific disability rates by age group, for example, which is
available in the original R functions (W.
Nusselder and Looman 2010), is not available in addhaz.
Nonetheless, in addhaz these interactions can be estimated by
including full interaction terms between chronic conditions and age
groups.
Although the parameter estimates of the binomial additive hazard model can also be obtained with the R package logbin, the contribution of diseases to the disability prevalence is not provided, since logbin was not developed with focus on the attribution method. Therefore, for analysis aimed at the attribution of disability to diseases, we recommend the use of addhaz. For multinomial outcomes, no other software is available to fit the multinomial additive hazard model and to calculate the contributions, to our knowledge.
One could argue that instead of using the multinomial model, two binomial models could be fitted: (i) no x mild disability; and (ii) no x severe disability. Although this is indeed possible, with a minor loss of precision (larger standard errors) for the parameter estimates when the reference category (“no disability", in our example) is the most frequent category in the population (which is the case for the subset of the BNHS used here, as 67% were not disabled, 13% reported mild disability, and 20% were severely disabled) (Agresti 2002), the prevalence of the various disability categories do not sum to 100%, as can be observed in a previous study that assessed the difference in the mild and severe disability burden using two binomial models (R. Yokota et al. 2015).
Different models with different options were presented using addhaz, showing a wide possibility of application to the users. One example is the investigation of the role of multimorbidity on the disability prevalence, which was assessed by the inclusion of two-way interactions between diseases in the model, as presented in model 2. Even though the examples only included the combination of two diseases, higher order interactions can also be included in the models, with the sample size being the limiting factor. In addition, since the prevalence of chronic conditions tends to increase over age, the model parameterization to include interactions between diseases and age groups was also shown for the binary (models 3 and 4) and multinomial disability outcomes (model 7). Although age group was used as the stratification variable to estimate the disabling impacts of the diseases and background, other variables can be used, such as education attainment and sex.
Furthermore, we illustrated how the likelihood ratio test (LRT) can
be performed for model selection using the function LRTest.
The LRT can be also performed for model selection with the multinomial
additive hazard model.
The attribution method has some limitations that should be considered. The main limitation of the method is the causality assumption. Although a causal relationship between diseases and disability is plausible and has been proposed in several disability models (Verbrugge and Jette 1994), it cannot be assessed with cross-sectional data. As a consequence, disability is incorrectly attributed to diseases when disability occurred before the diseases. Although the parallel option reduces significantly computation time for calculating bootstrap confidence intervals, fitting the multinomial model to high dimensional data can still be time-consuming. For example, the computational time to fit model 7, in a Windows computer Intel(R) Core (TM) i7-4600 CPU with 2.1GHz and 2.7GHz, 8GB (RAM), using the parallel option with 4 cores, was 23.04 hours.
In conclusion, addhaz is a publicly available tool to assess the contribution of chronic conditions to the disability prevalence, using cross-sectional data. The results produced by the tool can be used by policymakers to reduce the disability burden. Future areas of interest to improve the package include the extension of the multinomial model to ordinal responses and alternatives to reduce computation time for high dimensional data.