Abstract
The Autoregressive Distributed Lag approach to cointegration or bound testing, proposed by Pesaran in 2001, has become prominent in empirical research. Although this approach has many advantages over the classical cointegration tests, it is not exempt from drawbacks, such as possible inconclusive inference and distortion in size. Recently, Bertelli and coauthors developed a bootstrap approach to the bound tests to overcome these drawbacks. This paper introduces the R package bootCT, which implements this method by deriving the bootstrap versions of the bound tests and of the asymptotic F-test on the independent variables proposed by Sam and coauthors in 2019. As a spinoff, a general method for generating random multivariate time series following a given VECM/ARDL structure is provided in the package. Empirical applications showcase the main functionality of the package.
Cointegration and error correction are fundamental concepts in the
analysis of economic data, insofar as they provide an appropriate
framework for testing economic hypotheses about growth and fluctuation.
Several approaches have been proposed in the literature to determine
whether two or more non-stationary time series are cointegrated, meaning
they share a common long-run relationship.
There are two basic types of tests for cointegration: single equation
tests and VAR-based tests. The former check the presence of unit roots
in cointegration residuals (see, e.g., Robert F.
Engle and Granger 1987; Robert F. Engle and Yoo 1987; Mackinnon 1991;
Gabriel, Psaradakis, and Sola 2002; Cook 2006) or test the
significance of the error-correction (EC) term coefficient (Kremers, Ericsson, and Dolado 1992; Maddala and Kim
1998; Arranz and Escribano 2000; Ericsson and MacKinnon 2002).
The latter, such as the Johansen (1991)
approach, tackle the problem of detecting cointegrating relationships in
a VAR model. This latter approach, albeit having the advantage of
avoiding the issue of normalization, as well as allowing the detection
of multiple cointegrating vectors, is far from being perfect. In the VAR
system all variables are treated symmetrically, as opposed to the
standard univariate models that usually have a clear interpretation in
terms of exogenous and endogenous variables. Furthermore, in a VAR
system all the variables are estimated at the same time, which is
problematic if the relation between some variables is flawed, that is
affected by some source of error. In this case a simultaneous estimation
process tends to propagate the error affecting one equation to the
others. Furthermore, a multidimensional VAR models employs plenty of
degrees of freedom.
The recent cointegration approach, known as Autoregressive Distributed
Lag (ARDL) approach to cointegration or bound testing, proposed by
Pesaran, Shin, and Smith (2001) (PSS),
falls in the former strand of literature. It has become prominent in
empirical research because it shows several advantages with respect to
traditional methods for testing cointegration. First, it is applicable
also in cases of mixed order integrated variables, albeit with
integration not exceeding the first order. Thus, it evades the necessity
of pre-testing the variables and, accordingly, avoids some common
practices that may prevent finding cointegrating relationships, such as
dropping variables or transforming them into stationary form (see McNown, Sam, and Goh 2018). Second,
cointegration bound tests are performed in an ARDL model that allows
different lag orders for each variable, thus providing a more flexible
framework than other commonly employed approaches. Finally, unlike other
cointegration techniques, which are sensitive to the sample size, the
ARDL approach provides robust and consistent results for small sample
sizes.
Notably, the ARDL bound testing methodology has quickly spread in
economics and econometrics to study the cointegrating relationships
between macroeconomic and financial variables, to evaluate the long-run
impact of energy variables, or to assess recent environmental policies
and their impact on the economy. Among the many applications, see for
instance Haseeb et al. (2019; Reda and Nourhan
2020; Menegaki 2019; Yilanci, Bozoklu, and Gorus 2020; Hussain et al.
2019; Abbasi et al. 2021).
The original bound tests proposed by Pesaran,
Shin, and Smith (2001) are an \(F\)-test for the significance of the
coefficients of all lagged level variables entering the error correction
term (\(F_{ov}\)), and a \(t\)-test for the coefficient of the lagged
dependent variable. When either the dependent or the independent
variables do not appear in the long-run relationship, a degenerate case
arises. The bound \(t\)-test provides
answers on the occurrence of a degenerate case of second type, while the
occurrence of a degeneracy case of first type can be assessed by testing
whether the dependent variable is of integration order I(1). This type
of check violates the spirit and motivation of the bound tests, which
are supposed to be applicable in situations of unknown order of
integration for the variables.
Recently, McNown, Sam, and Goh (2018)
pointed out how, due to the low power problem of unit root tests,
investigating the presence of a first type degeneracy by testing the
integration order of the dependent variable may lead to incorrect
conclusions. Therefore, they suggested checking for its occurrence by
testing the significance of the lagged levels of the independent
variables via an extra \(F\)-test
(\(F_{ind}\)), which was also worked
out in its asymptotic version [SMK; Sam, McNown,
and Goh (2019)].
Besides problems in testing the occurrence of degenerate cases, in
general, the main drawback of the bound tests is the occurrence of
potentially inconclusive results, if the test statistic lies between the
bounds of the test distribution under the null. Furthermore, the
asymptotic distributions of the statistics may provide a poor
approximation of the true distributions in small samples. Finite sample
critical values, even if only for a subset of all possible model
specifications, have been worked out in the literature (see Mills and Pentecost 2001; Narayan and Smyth 2004;
Kanioura and Turner 2005; Narayan 2005), while (Kripfganz and Schneider 2020) provided the
quantiles of the asymptotic distributions of the tests as functions of
the sample size, the lag order and the number of long-run forcing
variables. However, this relevant improvement does not eliminate the
uncertainty related to the inconclusive regions, or the existence of
other critical issues related to the underlying assumptions of the bound
test framework, such as the (weak) exogeneity of the independent
variables or the non-stationarity of the dependent variable.
To overcome the mentioned bound test drawbacks, (Bertelli, Vacca, and Zoia 2022) proposed
bootstrapping the ARDL cointegration test. Inference can always be
pursued with ARDL bootstrap tests, unlike what happens with both the PSS
tests and the SMK test on the independent variables. Bootstrap ARDL
tests were first put forward by (McNown, Sam, and
Goh 2018) in an unconditional ARDL model, which omits the
instantaneous differences of the exogenous variables in the ARDL
equation, rather than a conditional one, as originally proposed by (Pesaran, Shin, and Smith 2001). The
unconditional model is often used, for reason of practical convenience,
in empirical research. Simulation results in (Bertelli, Vacca, and Zoia 2022) have
highlighted the importance of employing the appropriate specification,
especially under degenerate cases. In fact, it has been pointed out that
a correct detection of these cases requires the comparison of the test
outcomes in both the conditional and unconditional settings. Erroneous
conclusions, based exclusively on one model specification, can thus be
avoided.
In this paper, bootstrap bound tests, thereby including the bootstrap
versions of the \(F_{ov}\), \(t\) and \(F_{ind}\) bound tests, are carried out in a
conditional ARDL model setting. This approach allows to overcome the
problem of inconclusive regions of the standard bound tests. A
comparison with the outcomes engendered by the unconditional ARDL
bootstrap tests is nevertheless provided for the \(F_{ind}\) test, to avoid erroneous
inference in presence of degenerate cases.
The paper is organized as follows. Section 2 introduces the theoretical results of
the ARDL cointegration bound tests. Section 3
details the steps carried out by the bootstrap procedure, which allows
the construction of the (bootstrap) distribution - under the null - for
the \(F_{ov}\), \(t\), conditional \(F_{ind}\) and unconditional \(F_{ind}\) tests. Section 4 introduces the R package bootCT
(Vacca and Bertelli 2023) and its
functionalities: a method for the generation of random multivariate time
series that follow a user-specified VECM/ARDL structure, with some
examples, and the main function that carries out the aforementioned
bootstrap tests, while also computing the PSS and SMK bound tests. The
trade-off between accuracy and computational time of the bootstrap
procedure is also investigated, under several scenarios in terms of
sample size and number of replications. Notably, a function that
performs the PSS bound tests is already available in the dynamac
package (Jordan and Philips 2020), while
no R routine has so far been implemented for the SMK test,
to the best of our knowledge. Section 5 gives
some empirical applications that employ the core function of the package
and its possible outputs. Section 6 concludes.
Appendix 7 briefly delves into technical
details of the conditional ARDL model and its possible specifications 1.
The starting point of the approach proposed by (Pesaran, Shin, and Smith 2001) is a \((K+1)\) VAR(\(p\)) model \[
\mathbf{A}(L)(\mathbf{z}_t-\boldsymbol{\mu}-\boldsymbol{\eta}t)=\boldsymbol{\varepsilon}_t
\enspace \enspace \enspace \boldsymbol{\varepsilon}_t\sim N(\mathbf{0},
\boldsymbol{\Sigma}),\qquad\mathbf{A}(L)=\left(\mathbf{I}_{K+1}- \sum_{j=1}^{p}\mathbf{A}_j\mathbf{L}^j\right)
\enspace \enspace \enspace t=1,2,\dots,T. (\#eq:var)\] Here,
\(\mathbf{A}_j\) are square \((K+1)\) matrices, \(\mathbf{z}_t\) a vector of \((K+1)\) variables, \(\boldsymbol{\mu}\) and \(\boldsymbol{\eta}\) are \((K+1)\) vectors representing the drift and
the trend respectively, and \(\det(\mathbf{A}(z))=0\) for \(|z| \geq 1\). If the matrix \(\mathbf{A}(1)=\mathbf{I}_{K+1}-\sum_{j=1}^{p}\mathbf{A}_{j}\)
is singular, the components of \(\mathbf{z}_t\) turn out to be integrated
and possibly cointegrated.
The VECM representation of @ref(eq:var) is given by (see Appendix 7.1 for details) \[
\Delta\mathbf{z}_t=\boldsymbol{\alpha}_{0}+\boldsymbol{\alpha}_{1}t-\mathbf{A}(1)\mathbf{z}_{t-1}+\sum_{j=1}^{p-1}\boldsymbol{\Gamma}_{j}\Delta
\mathbf{z}_{t-j}+\boldsymbol{\varepsilon}_t. (\#eq:vecm)\] Now,
to study the adjustment to the equilibrium of a single variable \(y_t\), given the other \(\mathbf{x}_t\) variables, the vectors \(\mathbf{z}_t\) and \(\boldsymbol{\varepsilon}_t\) are
partitioned \[
\mathbf{z}_t=\begin{bmatrix}
\underset{(1,1)}{y_{t}} \\ \underset{(K,1)}{\mathbf{x}_{t}}
\end{bmatrix}, \enspace \enspace \enspace
\boldsymbol{\varepsilon}_t=\begin{bmatrix}
\underset{(1,1)}{\varepsilon_{yt}}
\\ \underset{(K,1)}{\boldsymbol{\varepsilon}_{xt}}
\end{bmatrix}. (\#eq:vecpart)\] The matrix \(\mathbf{A}(1)\), which is assumed to be
singular to allow cointegration, is partitioned conformably to \(\mathbf{z}_{t}\) as 2
\[\mathbf{A}(1)=\begin{bmatrix}
\underset{(1,1)}{a_{yy}} & \underset{(1,K)}{\mathbf{a}_{yx}'} \\
\underset{(K,1)}{\mathbf{a}_{xy}} &
\underset{(K,K)}{\mathbf{A}_{xx}}
\end{bmatrix}.\] Under the assumption \[
\boldsymbol{\varepsilon}_t \sim N\Bigg(\mathbf{0}, \begin{bmatrix}
\underset{(1,1)}{\sigma_{yy}}&
\underset{(1,K)}{\boldsymbol{\sigma}_{yx}'} \\
\underset{(K,1)}{\boldsymbol{\sigma}_{xy}} &
\underset{(K,K)}{\boldsymbol{\Sigma}_{xx}}
\end{bmatrix}\Bigg), (\#eq:normerr)\] the following holds \[
\varepsilon_{yt}=\boldsymbol{\omega}'\boldsymbol{\varepsilon}_{xt}+\nu_{yt}
\sim N(0,\sigma_{y.x}), (\#eq:epsilonx)\] where \(\sigma_{y.x}=\sigma_{yy}-\boldsymbol{\omega}'\boldsymbol{\sigma}_{xy}\)
with \(\boldsymbol{\omega}'=\boldsymbol{\sigma}'_{yx}\boldsymbol{\Sigma}^{-1}_{xx}\),
and \(\nu_{yt}\) is independent of
\(\boldsymbol{\varepsilon}_{xt}\).
Substituting @ref(eq:epsilonx) into @ref(eq:vecm) and assuming that the
\(\mathbf{x}_{t}\) variables are
exogenous towards the ARDL parameters (that is, setting \(\mathbf{a}_{xy}=\mathbf{0}\) in \(\mathbf{A}(1)\)) yields the system (see
Appendix 7.1 for details) \[
\Delta y_{t}=\alpha_{0.y}+\alpha_{1.y}t -a_{yy}EC_{t-1}+
\sum_{j=1}^{p-1}\boldsymbol{\gamma}'_{y.x,j}\Delta\mathbf{z}_{t-j}+\boldsymbol{\omega}'\Delta\mathbf{x}_{t}+\nu_{yt} (\#eq:ardl)\]
\[ \Delta\mathbf{x}_{t} = \boldsymbol{\alpha}_{0x} +\boldsymbol{\alpha}_{1x}t+ \mathbf{A}_{(x)}\mathbf{z}_{t-1}+ \boldsymbol{\Gamma}_{(x)}(L)\Delta\mathbf{z}_t+ \boldsymbol{\varepsilon}_{xt}, (\#eq:marg)\] where \[ \boldsymbol\gamma_{y.x,j}'=\boldsymbol\gamma_{y,j}'-\boldsymbol{\omega}'\boldsymbol{\Gamma}_{(x),j} (\#eq:ardlgamma)\]
\[
\alpha_{0.y}=\alpha_{0y}-\boldsymbol{\omega}'\boldsymbol{\alpha}_{0x},
\enspace \enspace \enspace
\alpha_{1.y}=\alpha_{1y}-\boldsymbol{\omega}'\boldsymbol{\alpha}_{1x}, (\#eq:ardldet)\]
and where the error correction term, \(EC_{t-1}\), expressing the long-run
equilibrium relationship between \(y_{t}\) and \(\mathbf{x}_{t}\), is given by \[
EC_{t-1}=y_{t-1}-\theta_{0}-\theta_{1}t-\boldsymbol{\theta}'\mathbf{x}_{t-1}, (\#eq:ec)\]
with \[
\theta_{0}=\mu_{y}-\boldsymbol{\theta}'\boldsymbol{\mu}_{x},
\enspace
\theta_{1}=\eta_{y}-\boldsymbol{\theta}'\boldsymbol{\eta}_{x},
\enspace\boldsymbol{\theta}'=-\frac{\widetilde{\mathbf{a}'}_{y.x}}{a_{yy}}=-\frac{\mathbf{a}_{yx}'-\boldsymbol{\omega}'\mathbf{A}_{xx}}{a_{yy}}. (\#eq:const)\]
Thus, no cointegration occurs when \(\widetilde{\mathbf{a}}_{y.x}=\mathbf{0}\)
or \(a_{yy}=0\) . These two
circumstances are referred to as degenerate case of second and first
type, respectively. Degenerate cases imply no cointegration between
\(y_{t}\) and \(\mathbf{x}_{t}\).
To test the hypothesis of cointegration between \(y_{t}\) and \(\mathbf{x}_{t}\), Pesaran, Shin, and Smith (2001) proposed an
\(F\)-test, \(F_{ov}\) hereafter, based on the hypothesis
system \[\begin{align}
H_0: a_{yy}=0 \; \cap \;\widetilde{\mathbf{a}}_{y.x}=\mathbf{0}\\
H_1: a_{yy} \neq 0 \; \cup \;\widetilde{\mathbf{a}}_{y.x}\neq
\mathbf{0}.(\#eq:h0sys)
\end{align}\]
Note that \(H_{1}\) covers also the
degenerate cases \[\begin{align}
H_1^{y.x}: a_{yy}=0 \; , \;\widetilde{\mathbf{a}}_{y.x}\neq\mathbf{0}\\
H_1^{yy}: a_{yy} \neq 0 \; , \;\widetilde{\mathbf{a}}_{y.x} =
\mathbf{0}.(\#eq:h0deg)
\end{align}\]
The exact distribution of the \(F\)
statistic under the null is unknown, but it is limited from above and
below by two asymptotic distributions: one corresponding to the case of
stationary regressors, and another corresponding to the case of
first-order integrated regressors. As a consequence, the test is called
bound test and has an inconclusive area. 3
Pesaran, Shin, and Smith (2001) worked
out two sets of (asymptotic) critical values: one, \(\{\tau_{L,F}\}\), for the case when \(\mathbf{x}_{t}\sim{I}(0)\) and another,
\(\{\tau_{U,F}\}\), for the case when
\(\mathbf{x}_{t}\sim{I}(1)\). These
values vary in accordance with the number of regressors in the ARDL
equation, the sample size and the assumptions made about the
deterministic components (intercept and trend) of the data generating
process.
In this regard, Pesaran, Shin, and Smith
(2001) introduced five different specifications for the ARDL
model, depending on its deterministic components, which are (see
Appendix 7.2 for details)
No intercept and no trend \[\begin{align}
\Delta
y_t=-a_{yy}EC_{t-1}+\sum_{j=1}^{p-1}\boldsymbol{\gamma}'_{y.x,j}\Delta\mathbf{z}_{t-j}+\boldsymbol{\omega}'\Delta\mathbf{x}_{t}+\nu_{yt},(\#eq:case1)
\end{align}\]
where \(EC_{t-1}=y_{t-1}-\boldsymbol{\theta}'\mathbf{x}_{t-1}\),
Restricted intercept and no trend \[\begin{align}
\Delta y_{t}=
-a_{yy}EC_{t-1}+\sum_{j=1}^{p-1}\boldsymbol{\gamma}'_{y.x,j}\Delta\mathbf{z}_{t-j}+\boldsymbol{\omega}'\Delta\mathbf{x}_{t}+\nu_{yt},(\#eq:case2)
\end{align}\]
where \(EC_{t-1}=y_{t-1}-\theta_{0}-\boldsymbol{\theta}'\mathbf{x}_{t-1}\).
The intercept extracted from the EC term is \(\alpha_{0.y}^{EC} =
a_{yy}\theta_0\).
Unrestricted intercept and no trend \[\begin{align}
\Delta y_{t}
=\alpha_{0.y}-a_{yy}EC_{t-1}+\sum_{j=1}^{p-1}\boldsymbol{\gamma}'_{y.x,j}\Delta\mathbf{z}_{t-j}+\boldsymbol{\omega}'\Delta\mathbf{x}_{t}+\nu_{yt},(\#eq:case3)
\end{align}\]
where \(EC_{t-1}=y_{t-1}-\boldsymbol{\theta}'\mathbf{x}_{t-1}\).
Unrestricted intercept, restricted trend \[\begin{align}
\Delta y_{t}=
\alpha_{0.y}-a_{yy}EC_{t-1}+\sum_{j=1}^{p-1}\boldsymbol{\gamma}'_{y.x,j}\Delta\mathbf{z}_{t-j}+\boldsymbol{\omega}'\Delta\mathbf{x}_{t}+\nu_{yt},(\#eq:case4)
\end{align}\]
where \(EC_{t-1}=y_{t-1}-\theta_{1}t-\boldsymbol{\theta}'\mathbf{x}_{t-1}\).
The trend extracted from the EC term is \(\alpha_{1.y}^{EC} =
a_{yy}\theta_1\).
Unrestricted intercept, unrestricted trend \[\begin{align}
\Delta y_{t}
=\alpha_{0.y}+\alpha_{1.y}t
-a_{yy}EC_{t-1}+\sum_{j=1}^{p-1}\boldsymbol{\gamma}'_{y.x,j}\Delta\mathbf{z}_{t-j}+\boldsymbol{\omega}'\Delta\mathbf{x}_{t}+\nu_{yt},
(\#eq:case5)
\end{align}\]
where \(EC_{t-1}=y_{t-1}-\boldsymbol{\theta}'\mathbf{x}_{t-1}\).
The model in @ref(eq:ardl) proposed by Pesaran, Shin, and Smith (2001) represents the
correct framework in which to carry out bound tests. However, bound test
are often performed in an unconditional ARDL model setting, specified as
\[
\Delta y_{t}=\alpha_{0.y}+\alpha_{1.y}t -a_{yy}EC_{t-1}+
\sum_{j=1}^{p-1}\boldsymbol{\gamma}'_{j}\Delta\mathbf{z}_{t-j}+\varepsilon_{yt}, (\#eq:ardluc)\]
which omits the term \(\boldsymbol{\omega}'\Delta\mathbf{x}_{t}\).
(Bertelli, Vacca, and Zoia 2022) have
highlighted that bootstrap tests performed in these two ARDL
specifications can lead to contrasting results. To explain this
divergence, note that the conditional model makes use of the following
vector in the EC term \[\widetilde{\mathbf{a}}_{y.x}'=\mathbf{a}_{yx}'-\boldsymbol{\omega}'\mathbf{A}_{xx}\]
(divided by \(a_{yy}\), see
@ref(eq:const)) to carry out bound tests, while the unconditional one
only uses the vector \(\mathbf{a}_{yx}'\), (divided by \(a_{yy}\)), since it neglects the term \(\boldsymbol{\omega}'\mathbf{A}_{xx}\).
4 This
can lead to contrasting inference in two instances. The first happens
when a degeneracy of first type occurs in the conditional model, that is
\[
\widetilde{\mathbf{a}}_{y.x}'=\mathbf{0}, (\#eq:deg1cond)\]
because \[\mathbf{a}_{yx}'=\boldsymbol{\omega}'\mathbf{A}_{xx}.\]
In this case, the conditional model rejects cointegration, while the
unconditional one concludes the opposite. The other case happens when a
degeneracy of first type occurs in the unconditional model, that is
\[
\mathbf{a}_{yx}'=\mathbf{0}, (\#eq:deg1uc)\] but \[\widetilde{\mathbf{a}}_{y.x}'=\boldsymbol{\omega}'\mathbf{A}_{xx}
\neq \mathbf{0}.\] In this case, the unconditional model rejects
cointegration, while the conditional one concludes for the existence of
cointegrating relationships, which are however spurious. Only a
comparison of the outcomes of the \(F_{ind}\) test performed in both the
conditional and unconditional ARDL equation can help to disentangle this
problem. 5
In the following, bootstrap tests are carried out in the conditional
ARDL model @ref(eq:ardl). However, when a degeneracy of first type
occurs in the unconditional model, the outcomes of the \(F_{ind}\) bootstrap test performed in both
the conditional and unconditional settings are provided. This, as
previously outlined, is performed to avoid the acceptance of spurious
long-run relationships among the dependent variable and the independent
variables.
The bootstrap procedure here proposed focuses on a ARDL model
specified as in @ref(eq:case1)-@ref(eq:case5), depending on the
assumptions on the deterministic components.
The bootstrap procedure consists of the following steps:
The ARDL model is estimated via OLS and the related test statistics \(F_{ov}\), \(t\) or \(F_{ind}\) are computed.
In order to construct the distribution of each test statistic under the corresponding null, the same model is re-estimated imposing the appropriate restrictions on the coefficients according to the test under consideration.
Following (McNown, Sam, and Goh 2018), the ARDL restricted residuals are then computed. For example, under Case III, the residuals are \[ \widehat{\nu}_{yt}^{F_{ov}}=\Delta y_{t}-\widehat{\alpha}_{0.y}-\sum_{j=1}^{p-1}\widehat{\boldsymbol{\gamma}}_{y.x,j}'\Delta\mathbf{z}_{t-j}-\widehat{\boldsymbol{\omega}}'\Delta\mathbf{x}_{t} (\#eq:resfov)\]
\[ \widehat{\nu}_{yt}^{t}=\Delta y_{t}-\widehat{\alpha}_{0.y}+\widehat{\widetilde{\mathbf{a}}}'_{y.x}\mathbf{x}_{t-1}- \sum_{j=1}^{p-1}\widehat{\boldsymbol{\gamma}}_{y.x,j}'\Delta\mathbf{z}_{t-j}-\widehat{\boldsymbol{\omega}}'\Delta\mathbf{x}_{t} (\#eq:rest)\]
\[ \widehat{\nu}_{yt}^{F_{ind}}=\Delta y_{t}-\widehat{\alpha}_{0.y}+\widehat{a}_{yy}y_{t-1}- \sum_{j=1}^{p-1}\widehat{\boldsymbol{\gamma}}_{y.x,j}'\Delta\mathbf{z}_{t-j}-\widehat{\boldsymbol{\omega}}'\Delta\mathbf{x}_{t}. (\#eq:resfind)\] Here, the apex \("\widehat{\,\,.\,\,}"\) denotes the estimated parameters. The other cases can be dealt with in a similar manner.
The VECM model
\[ \Delta\mathbf{z}_{t}=\boldsymbol{\alpha}_{0}-\mathbf{A}\mathbf{z}_{t-1}+ \sum_{j=1}^{p-1}\boldsymbol{\Gamma}_{j}\Delta\mathbf{z}_{t-j}+\boldsymbol{\varepsilon}_{t} (\#eq:vecmhat)\] is estimated as well (imposing weak exogeneity), and the residuals
\[ \widehat{\boldsymbol{\varepsilon}}_{xt}= \Delta\mathbf{x}_{t}-\widehat{\boldsymbol{\alpha}}_{0x}+\widehat{\mathbf{A}}_{xx}\mathbf{x}_{t-1}- \sum_{j=1}^{p-1}\widehat{\boldsymbol{\Gamma}}_{(x)j}\Delta\mathbf{z}_{t-j} (\#eq:resvecm)\] are computed. Thsis approach guarantees that the residuals \(\widehat{\boldsymbol{\varepsilon}}_{xt}\), associated to the variables \(\mathbf{x}_{t}\) explained by the marginal model @ref(eq:marg), are uncorrelated with the ARDL residuals \(\widehat{\nu}_{yt}^{.}\).
A large set of \(B\) bootstrap replicates are sampled from the residuals calculated as in @ref(eq:resfov),@ref(eq:rest), @ref(eq:resfind) and @ref(eq:resvecm). In each replication, the following operations are carried out:
Each set of \((T-p)\) resampled residuals (with replacement) \(\widehat{\boldsymbol{\nu}}_{zt}^{(b)}=(\widehat{\nu}_{yt}^{(b)},\widehat{\boldsymbol{\varepsilon}}_{xt}^{(b)})\) is re-centered (see Davidson and MacKinnon 2005) \[\begin{align} \dot{\widehat{\nu}}^{(b)}_{yt}&=\widehat{\nu}^{(b)}_{yt} -\frac{1}{T-p}\sum_{t=p+1}^{T}\widehat{\nu}^{(b)}_{yt} (\#eq:recentery) \\ \dot{\widehat{\boldsymbol{\varepsilon}}}^{b}_{x_{i}t}&=\widehat{\boldsymbol{\varepsilon}}^{(b)}_{x_{i}t}-\frac{1}{T-p}\sum_{t=p+1}^{T}\widehat{\boldsymbol{\varepsilon}}^{(b)}_{x_{i}t}\qquad i=1,\dots,K.(\#eq:recenterx) \end{align}\]
A sequential set of \((T-p)\)
bootstrap observations, \(y^{*}_{t}\enspace,
\mathbf{x}^{*}_{t}\enspace t=p+1,\dots,T\), is generated as
follows \[y^{*}_{t}=y^{*}_{t-1}+\Delta
y^{*}_{t}, \enspace \enspace
\mathbf{x}^{*}_{t}=\mathbf{x}^{*}_{t-1}+\Delta
\mathbf{x}^{*}_{t},\] where \(\Delta
\mathbf{x}^{*}_{t}\) are obtained from @ref(eq:resvecm) and \(\Delta y^{*}_{t}\) from either
@ref(eq:resfov), @ref(eq:rest) or @ref(eq:resfind) after replacing in
each of these equations the original residuals with the bootstrap
ones.
The initial conditions, that is the observations before \(t=p+1\), are obtained by drawing randomly
\(p\) observations in block from the
original data, so as to preserve the data dependence structure.
An unrestricted ARDL model is estimated via OLS using the bootstrap observations, and the statistics \(F_{ov}^{(b),H_0}\), \(t^{(b),H_0}\) \(F_{ind}^{(b),H_0}\) are computed.
The bootstrap distributions of \(\big\{F_{ov}^{(b),H_0}\big\}_{b=1}^B\),
\(\big\{F_{ind}^{(b),H_0}\big\}_{b=1}^B\) and
\(\big\{t^{(b),H_0}\big\}_{b=1}^B\)
under the null are then employed to determine the critical values of the
tests. By denoting with \(M^*_b\) the
ordered bootstrap test statistic, and with \(\alpha\) the nominal significance level,
the bootstrap critical values are determined as follows \[
c^*_{\alpha,M}=\min\bigg\{c:\sum_{b=1}^{B}\mathbf{1}_{\{M^*_b >c\}}
\leq\alpha\bigg\}
\qquad M\in\{F_{ov},F_{ind}\} (\#eq:bootf)\] for the \(F\) tests and \[
c^*_{{\alpha,t}}=\max\bigg\{c:\sum_{b=1}^{B}\mathbf{1}_{\{t^*_b<c\}} \leq
{\alpha}\bigg\} (\#eq:boott)\] for the \(t\) test.
Here, \(\mathbf{1}_{\{x \in A\}}\) is
the indicator function, which is equal to one if the condition in
subscript is satisfied and zero otherwise.
The null hypothesis is rejected if the \(F\) statistic computed at step 1, \(F_{ov}\) or \(F_{ind}\), is greater than the respective \(c^*_{\alpha,M}\), or if the \(t\) statistic computed at the same step is lower than \(c^*_{{\alpha,t}}\).
This section describes the main functionalities of the bootCT
package. The functions included in the package are essentially of two
types. The function sim_vecm_ardl generates data according
to a given data generating process (DGP), assuming either the presence
or the absence of cointegrating relationships between variables, or
degenerate cases. The function boot_ardl tests the presence
of cointegrating relationships employing the Pesaran ARDL bound tests
(\(F_{ov}\) and \(t\)), the SMK bound test on lagged
independent variables (\(F_{ind}\)),
and the novel ARDL bootstrap testing procedure.
sim_vecm_ardl functionThe function sim_vecm_ardl allows to simulate a
multivariate time series from a given conditional ARDL specification for
a dependent variable \(y_t\) and a
VAR/VECM specification for the remaining independent variables \(\mathbf{x}_t\). In sthis regard, it
represents an interesting addition to extant data generating procedures
for VAR/VECM models. The arguments of this function can be divided into
two subgroups.
A group of parameters pertains the VECM model @ref(eq:ardl) and
@ref(eq:marg), with \(\mathbf{A}_{xx}\)
identifying the matrix of the long-run relationships among the \(\mathbf x_t\) variables, and \(\boldsymbol\Gamma_j\)’s, \(j=1,...,p-1\) the short-run matrices of the
system variables. Additionally, the parameter \(a_{yy}\) weighs the EC term for \(y_t\), while \(\mathbf{a}_{yx}'\) is the parameter
vector weighting the variables \(\mathbf{x}_{t}\) in the ARDL equation. The
vector \(\mathbf{a}_{yx}'\), after
conditioning \(y_t\) on the other
variables (\(\mathbf{x}_t\), see model
@ref(eq:ardl)) becomes \(\widetilde{\mathbf{a}}_{y.x}' =
\mathbf{a}_{yx}' -
\boldsymbol\omega'\mathbf{A}_{xx}\).
The second group of parameters concerns the model intercept and trend of
the VAR specification, \(\boldsymbol\mu\) and \(\boldsymbol\eta\), which in the VECM
representation become \(\boldsymbol\alpha_0 =
\mathbf A \boldsymbol\mu + (\mathbf I_{K+1} - \sum_{i=1}^{p-1}
\boldsymbol\Gamma_j-\mathbf A)\boldsymbol\eta\) and \(\boldsymbol{\alpha}_1 = \mathbf A
\boldsymbol\eta\) and in the conditional ARDL become \(\alpha_{0.y}^{EC}=
a_{yy}(\mu_{y}-\widetilde{\mathbf{ a}}_{y.x}'\boldsymbol
\mu_{x})+\boldsymbol\gamma_{y.x}'(1)\boldsymbol\eta\) and
\(a_{1.y}^{EC}=a_{yy}(\eta_y-\widetilde{\mathbf{
a}}_{y.x}'\boldsymbol\eta_x)\). As explained in Appendix 7.2, intercept and trend appear in the error
correction (EC) term of the ARDL equation only when restricted.
Accordingly, they both do not appear in the EC in the case I, the
intercept does not appear in the EC term in cases III, IV and V (it is
freely set to \(\alpha_{0.y}\)) while
the trend appears in the EC term only in the case IV (it is freely set
to \(\alpha_{1.y}\) for case V).
Accordingly, when these terms are not restricted, they need to be
supplied by the user.
The approach used to specify the function inputs offers great control to
the user, in terms of generating specific (conditional) ARDL-based
cointegration structures.
The function sim_vecm_ardl takes the following
arguments:
nobs: number of observations to generate;
case: indicates the conditional ARDL specification
in terms of deterministic component (intercept and trend) among the five
specifications proposed by Pesaran, Shin, and
Smith (2001), given in @ref(eq:case1)-@ref(eq:case5).
sigma.in: covariance matrix, \(\boldsymbol{\Sigma}\), of the error term
\(\boldsymbol{\varepsilon}_{t}\);
gamma.in: list of short-run parameter matrices \(\boldsymbol\Gamma_j\);
axx.in: cointegrating relationships, \(\mathbf{A}_{xx}\), pertaining the
independent variables in the marginal VECM model;
ayx.uc.in: vector of parameters, as in \(\mathbf{a}_{yx}\);
ayy.in: the \(a_{yy}\) term, weighting the EC term in the
ARDL equation;
mu.in: mean vector, \(\boldsymbol\mu\), in the starting VAR
specification, used to define the VECM intercept for CASE II;
eta.in: trend vector, \(\boldsymbol\eta\), in the starting VAR
specification, used to define the VECM trend for case IV;
azero.in: unrestricted intercept of the VECM
specification (valid only for cases III, IV and V), when the intercept
is not involved in the EC term;
aone.in: unrestricted coefficient of the trend in
the VECM specification (valid only for case V), when the trend is not
involved in the EC term;
burn.in: additional observations burn-in
observations to be generated. A total of burn.in + nobs
observations are generated, but only the last nobs are kept
in the data;
seed.in: seed number for the generation of \(\boldsymbol{\varepsilon}_{t}\sim N(\mathbf
0,\boldsymbol\Sigma)\).
If parameter values for mu.in, eta.in,
azero.in, or aone.in and case number turn out
to be in contradiction, an error message is displayed.
As output, the function gives out a list containing the data, both in
level and first difference, along with all the parameter values given as
input. Additionally, all intermediate transformation of parameters via
VECM transformation or as a by-product of conditioning \(y_{t}\) on \(\mathbf{x}_{t}\) are included in the
output.
Figure @ref(fig:figsim) depicts three-time series, dep_1_0,
ind_1_0 and ind_2_0, generated using this
function and affected by a cointegrating relationship, one panel for
each case, from I to V. The variable dep_1_0 represents the
dependent variable \(y_t\) of the ARDL
equation, while ind_1_0 and ind_2_0 the
independent ones, \(x_{1t}\) and \(x_{2t}\).
The code used to generate the data for case I is the following:
corrm = matrix(c( 0, 0, 0,
0.25, 0, 0,
0.4, -0.25, 0), nrow = 3, ncol = 3, byrow = T)
Corrm = (corrm + t(corrm)) + diag(3)
sds = diag(c(1.3, 1.2, 1))
sigma.in = (sds %*% Corrm %*% t(sds))
gamma1 = matrix(c(0.6, 0, 0.2,
0.1, -0.3, 0,
0, -0.3, 0.2), nrow = 3, ncol = 3,byrow=T)
gamma2= gamma1 * 0.3
omegat = sigma.in[1, -1] %*% solve(sigma.in[-1, -1])
axx.in = matrix(c( 0.3, 0.5,
-0.4, 0.3), nrow = 2, ncol = 2, byrow = T)
ayx.uc.in = c(0.4, 0.4)
ayy.in = 0.6
data.vecm.ardl_1 =
sim_vecm_ardl(nobs = 200,
case = 1,
sigma.in = sigma.in,
gamma.in = list(gamma1, gamma2),
axx.in = axx.in,
ayx.uc.in = ayx.uc.in,
ayy.in = ayy.in,
mu.in = rep(0, 3),
eta.in = rep(0, 3),
azero.in = rep(0, 3),
aone.in = rep(0, 3),
burn.in = 100,
seed.in = 999)
Additionally, Figure @ref(fig:figsim2) displays other three time
series, dep_1_0 (\(y_t\)),
ind_1_0 (\(x_{1t}\)) and
ind_2_0 (\(x_{2t}\)), when
a degeneracy of second type occurs (\(a_{yy} =
0\)) in the long-run relationship in the ARDL equation of
dep_1_0 on ind_1_0, ind_2_0. The
five panels represents the behavior of these series in the Cases from I
to V. It is worth noting the different scenario implied by these cases:
case III depicts a trend for the \(y_t\) variable, case IV highlights the
inclusion of a trend in the cointegrating relationship, and case V
exhibits a quadratic trend in the \(y_t\) variable.
Finally, the flowchart in Figure @ref(fig:figflowchart) details the
internal steps of the function sim_vecm_ardl and the data
generation workflow. There, it is specified how the parameters of the
VAR, VECM and ARDL equation are introduced. Attention is paid on whether
the error correction mechanism involves either intercept or trend (or
both) via the internal computation of the parameters \(\theta_0\) and \(\theta_1\) (and thus \(\alpha_{0.y}^{EC}\) and \(\alpha_{1.y}^{EC}\)). When the EC term does
not involve intercept and/or trend, \(\boldsymbol\alpha_0\) and \(\boldsymbol\alpha_1\) are supplied by the
user, depending on the case under study.
Simulated data from the VECM / conditional ARDL specifications, for every case. Made with ggplot .
Simulated data from the VECM / conditional ARDL specifications (degenerate case of type 2, a_{yy}=0), for every case. Made with ggplot.
Flowchart of the sim_vecm_ardl function inner steps. When applying (7) and (8), \(y_{t_j}=0, \Delta y_{t_j}=0, \mathbf x_{t_j}=\mathbf 0, \Delta \mathbf x_{t_j}= \mathbf 0\) for any \(t_j < 1\). Boxes denote parameter definitions and transformations. Circles denote crucial actions, Empty nodes denote function inputs.
boot_ardl
functionThis function develops the bootstrap procedure detailed previously.
As an option in the initial estimation phase, it offers the possibility
of automatically choosing the best order for the lagged differences of
all the variables in the ARDL and VECM models. This is done by using
several criteria. In particular, AIC, BIC, AICc, \(R^2\) and \(R^2_{adj}\) are used as lag selection
criteria for the ARDL model, while the overall minimum between AIC,
HQIC, SC and FPE is used for the lag selection for the VECM.
In particular, the auto_ardl function in the package ARDL
(Natsiopoulos and Tzeremes 2021) selects
the best ARDL order in terms of the short-run parameter vectors \(\boldsymbol\gamma_{y.x,j}\), while the
VARselect function in the package vars
(Bernhard Pfaff 2008) selects the best
VECM order in terms of the short-run parameter matrices \(\boldsymbol\Gamma_{(x),j}\). Furthermore,
the user can input a significance threshold for the retention of single
parameters in the \(\boldsymbol\Gamma_j\) and in the \(\boldsymbol\gamma_{y.x,j}\) vectors.
The function boot_ardl takes the following arguments:
data: input dataset. Must contain a dependent
variable and a set of independent variables;
yvar: name of the dependent variable enclosed in
quotation marks. If unspecified, the first variable in the dataset is
used;
xvar: vector of names of the independent variables,
each enclosed in quotation marks. If unspecified, all variables in the
dataset except the first are used;
fix.ardl: vector \((j_1,\dots,j_K)\), containing the maximum
orders of the lagged differences (i.e., \(\Delta y_{t-j_1}, \Delta
x_{1,t-j_2},\dots,\) \(\Delta
x_{1,t-j_K}\)) for the short term part of the ARDL equation,
chosen in advance;
info.ardl: (alternatively to fix.ardl)
the information criterion used to choose the best lag order for the
short term part of the ARDL equation. It must be one between
AIC (default), AICc, BIC,
R2, , adjR2;
fix.vecm: scalar \(m\) containing the maximum order of the
lagged differences (i.e., \(\Delta\mathbf
z_{t-m}\)) for the short term part of the VECM equation, chosen
in advance;
info.vecm: (alternatively to fix.vecm)
the information criterion used to choose the best lag order for the
short term part of the VECM equation. Must be one among AIC
(default), HQIC, SC,
FPE;
maxlag: (in conjunction with info.ardl
/ info.vecm) maximum number of lags for the
auto_ardl function in the package ARDL,
and for the VARselect function in the package vars;
a.ardl: significance threshold for the short-term
ARDL coefficients (\(\boldsymbol\gamma_{y.x,j}\)) in the ARDL
model estimation;
a.vecm: significance threshold for the short-term
VECM coefficients (in \(\boldsymbol\Gamma_j\)) in the VECM model
estimation;
nboot: number of bootstrap replications;
case: type of the specification for the conditional
ARDL in terms of deterministic components (intercept and trend) among
the five proposed by (Pesaran, Shin, and Smith
2001), given in @ref(eq:case1)-@ref(eq:case5);
a.boot.H0: probability/ies \(\alpha\) by which the critical quantiles of
the bootstrap distribution(s) \(c^{*}_{\alpha,F_{ov}}\), \(c^{*}_{\alpha,t}\) and \(c^{*}_{\alpha,F_{ind}}\) must be
calculated;
print: if set to TRUE, shows the
progress bar.
boot_ardl makes use of the lag_mts function
which produces lagged versions of a given matrix of time series, each
column with a separate order. lag_mts takes as parameters
the data included in a matrix X and the lag orders in a
vector k, with the addition of a boolean parameter
last.only, which allows to specify whether only the \(k\)-th order lags have to be retained, or
all the lag orders from the first to the \(k\)-th.
boot_ardl also acts as a wrapper for the most common
methodologies detecting cointegration, offering a comprehensive view on
the testing procedures involved in the analysis. The resulting object,
of class bootCT, contains all the information about
The conditional ARDL model estimates, and the unconditional VECM model estimates;
the bootstrap tests performed in the conditional ARDL model;
the Pesaran, Shin and Smith bound testing procedure (\(F_{ov}\) and \(t\)-test, when applicable);
the Sam, McNown and Goh bound testing procedure for \(F_{ind}\), when applicable;
the Johansen rank and trace cointegration tests on the independent variables.
Internally, the bootstrap data generation under the null is executed
via a Rcpp function, employing the Rcpp
and RcppArmadillo
packages (Eddelbuettel 2013), so as to
greatly speed up computational times. As explained in the previous
section, cointegration tests in the unconditional ARDL model are
performed in order to uncover the presence of spurious cointegrating
relationships.
To this end, the function provides
the bootstrap critical values of the \(F_{ov}\), \(t\) and \(F_{ind}\) tests in the conditional model,
at level a.boot.H0, along with the same statistics computed
in the conditional model.
a flag, called fakecoint, that indicates divergence
between the outcomes of the \(F_{ind}\)
test performed in both the conditional and unconditional model. In this
circumstance, as explained before, there is no cointegration (see Bertelli, Vacca, and Zoia 2022).
A summary method has been implemented to present the
results in a visually clear manner. It accepts the additional argument
"out" that lets the user choose which output(s) to
visualize: ARDL prints the conditional ARDL model summary,
VECM prints the VECM model summary, cointARDL
prints the summary of the bound tests and the bootstrap tests,
cointVECM prints the summary of the Johansen test on the
independent variables.
A detailed flowchart showing the function’s workflow is displayed in
Figure @ref(fig:figflowchart-ardl). There, the expressions "C ARDL" and
"UC ARDL" stand for conditional and unconditional ARDL model,
respectively.
Flowchart of the boot_ardl function inner steps. Boxes denote parameter definitions and transformations. Diamonds denote function outputs. Dashed diamonds denote intermediate output (not shown after function call). Empty nodes denote function inputs. The first p+1 rows of \(\mathbf z_t^{(b)}\) are set equal to the first p+1 rows of the original data. The best lag order for each difference variable in the ARDL model is determined via auto_ardl(). It is reported as a unique value p in \(\boldsymbol{\gamma}_{y.x,j}\) for brevity in the flowchart.
In order to investigate the sensitivity of the procedure to different
sample sizes and number of bootstrap replicates, an experiment has been
run using a three-dimensional time series of length \(T=\{50,80,100,200,500\}\), generating 100
datasets for each sample size with the sim_vecm_ardl
function (Case II, with cointegrated variables, and 2 lags in the
short-run section of the model).
Then, the boot_ardl function has been called
boot_ardl(data = df_sim,
nboot = bootr,
case = 2,
fix.ardl = rep(2, 3),
fix.vecm = 2)
In the code above, bootr has been set equal to \(B=\{200,500,1000,2000\}\), the number of
lags has been assumed known (fix.ardl and
fix.vecm), while default values have been used for every
other argument (such as a.ardl, a.vecm and
a.boot.H0).
Table 1 shows the average running time per
replication together with the coefficient of variation (%) of the
bootstrap critical values of the \(F_{ov}\) test, for each value of \(T\) and \(B\), across 100 replications for each
scenario.
Naturally, the running time increases as both sample size and bootstrap
replicates increase. However, it can be noticed how the coefficients of
variation tend to stabilize for \(B \geq
1000\), especially for \(T>80\), at the 5% significance level.
Therefore, it is recommended a number of bootstrap replicates of at
least \(B=1000\) for higher sample
size, or at least \(B=2000\) for
smaller samples. The analysis has been carried out using an Intel(R)
Core(TM) i7-1165G7 CPU @ 2.80GHz processor, 16GB of RAM.
| \(T\) | \(B\) | Exec. Time (sec) | \(cv^{(F_{ov})}(5\%)\) | \(cv^{(F_{ov})}(2.5\%)\) | \(cv^{(F_{ov})}(1\%)\) |
|---|---|---|---|---|---|
| 50 | 200 | 23.38 | 8.648 | 10.925 | 13.392 |
| 50 | 500 | 48.37 | 6.312 | 6.952 | 8.640 |
| 50 | 1000 | 96.65 | 4.806 | 5.613 | 6.288 |
| 50 | 2000 | 231.15 | 4.255 | 4.226 | 4.946 |
| 80 | 200 | 23.46 | 7.251 | 8.936 | 11.263 |
| 80 | 500 | 50.19 | 4.998 | 6.220 | 7.946 |
| 80 | 1000 | 143.00 | 3.882 | 4.453 | 5.305 |
| 80 | 2000 | 255.64 | 2.912 | 3.623 | 4.518 |
| 100 | 200 | 37.89 | 7.707 | 8.583 | 10.955 |
| 100 | 500 | 52.86 | 4.691 | 5.304 | 7.557 |
| 100 | 1000 | 184.51 | 3.512 | 4.567 | 5.695 |
| 100 | 2000 | 212.65 | 3.519 | 3.674 | 4.185 |
| 200 | 200 | 35.46 | 6.644 | 7.173 | 10.365 |
| 200 | 500 | 76.78 | 4.734 | 5.355 | 6.225 |
| 200 | 1000 | 148.25 | 3.124 | 4.177 | 5.034 |
| 200 | 2000 | 484.51 | 2.811 | 3.361 | 3.907 |
| 500 | 200 | 54.47 | 6.641 | 8.694 | 10.414 |
| 500 | 500 | 133.17 | 5.137 | 5.816 | 6.408 |
| 500 | 1000 | 271.87 | 3.905 | 4.585 | 5.283 |
| 500 | 2000 | 561.71 | 3.221 | 3.490 | 4.145 |
Table 1: Average execution times (in seconds) of the
boot_ardl function, for different combinations of sample
size \(T\) and bootstrap replicates
\(B\). Coefficients of variation (\(cv\)) reported for the \(F_{ov}\) bootstrap critical values at level
5%, 2.5% and 1%.
This section provides two illustrative application which highlight the performance of the bootstrap ARDL tests.
In the first example, the occurrence of a long-run relationship
between consumption \[C\], income
\[INC\], and investment \[INV\] of Germany has been investigated via
a set of ARDL models, where each variable takes in turn the role of
dependent one, while the remaining are employed as independent. The
models have been estimated by employing the dataset of Lütkepohl (2005) which includes quarterly data
of the series over the years 1960 to 1982. The data have been employed
in logarithmic form. Figure @ref(fig:figplotemp) displays these series
over the sample period.
Before applying the bootstrap procedure, the order of integration of
each series has been analyzed. Table 2 shows the
results of ADF test performed on both the series and their
first-differences (\(k=3\) maximum
lags). The results confirm the applicability of the ARDL framework as no
series is integrated of order higher than one.
The following ARDL equations have been estimated:
First ARDL equation (C | INC, INV): \[\begin{align} \Delta \log \text{C}_{t}&=\alpha_{0.y} - a_{yy} \log \text{C}_{t-1} - {a}_{y.x_1}\log \text{INC}_{t-1} - {a}_{y.x_2}\log \text{INV}_{t-1} +\\\nonumber &\sum_{j=1}^{p-1}\gamma_{y.j} \Delta\log \text{C}_{t-j} + \sum_{j=1}^{s-1}\gamma_{x_1.j} \Delta\log \text{INC}_{t-j} + \sum_{j=1}^{r-1}\gamma_{x_2.j} \Delta\log \text{INV}_{t-j} +\\\nonumber &\omega_1 \Delta\log \text{INC}_{t}+ \omega_2 \Delta\log \text{INV}_{t}+\nu_{t}. \end{align}\]
Second ARDL equation (INC | C, INV): \[\begin{align} \Delta \log \text{INC}_{t}&=\alpha_{0.y} - a_{yy} \log \text{INC}_{t-1} - {a}_{y.x_1}\log \text{C}_{t-1} - {a}_{y.x_2}\log \text{INV}_{t-1} +\\\nonumber &\sum_{j=1}^{p-1}\gamma_{y.j} \Delta\log \text{INC}_{t-j} + \sum_{j=1}^{s-1}\gamma_{x_1.j} \Delta\log \text{C}_{t-j} + \sum_{j=1}^{r-1}\gamma_{x_2.j} \Delta\log \text{INV}_{t-j} +\\\nonumber &\omega_1 \Delta\log \text{C}_{t}+ \omega_2 \Delta\log \text{INV}_{t}+\nu_{t}. \end{align}\]
Third ARDL equation (INV | C, INC): \[\begin{align} \Delta \log \text{INV}_{t}&=\alpha_{0.y} - a_{yy} \log \text{INV}_{t-1} - {a}_{y.x_1}\log \text{C}_{t-1} - {a}_{y.x_2}\log \text{INC}_{t-1} +\\\nonumber &\sum_{j=1}^{p-1}\gamma_{y.j} \Delta\log \text{INV}_{t-j} + \sum_{j=1}^{s-1}\gamma_{x_1.j} \Delta\log \text{C}_{t-j} + \sum_{j=1}^{r-1}\gamma_{x_2.j} \Delta\log \text{INC}_{t-j} +\\\nonumber &\omega_1 \Delta\log \text{C}_{t}+ \omega_2 \Delta\log \text{INC}_{t}+\nu_{t}. \end{align}\]
Table 3 shows the estimation results for each
ARDL and VECM model. It is worth noting that the instantaneous
difference of the independent variables are highly significant in each
conditional ARDL model. Thus, neglecting these variables in the ARDL
equation, as happens in the unconditional version of the model, may
potentially lead to biased estimates and incorrect inference. For the
sake of completeness, also the results of the marginal VECM estimation
are reported for each model.
The code to prepare the data, available in the package as the
ger_macro dataset, is:
data("ger_macro")
LNDATA = apply(ger_macro[,-1], 2, log)
col_ln = paste0("LN", colnames(ger_macro)[-1])
LNDATA = as.data.frame(LNDATA)
colnames(LNDATA) = col_ln
Then, the boot_ardl function is called, to perform the
bootstrap tests. In the code chunk below, Model I is considered.
set.seed(999)
BCT_res_CONS = boot_ardl(data = LNDATA,
yvar = "LNCONS",
xvar = c("LNINCOME", "LNINVEST"),
maxlag = 5,
a.ardl = 0.1,
a.vecm = 0.1,
nboot = 2000,
case = 3,
a.boot.H0 = c(0.05),
print = T)
to which follows the call to the summary function
summary(BCT_res_CONS, out = "ARDL")
summary(BCT_res_CONS, out = "VECM")
summary(BCT_res_CONS, out = "cointVECM")
summary(BCT_res_CONS, out = "cointARDL")
The first summary line displays the output in the ARDL column of
Table 3 and the second column of Table 4, Model I. The second line corresponds to the
VECM columns of Table 3, Model I - only for the
independent variables. The information on the rank of the \(\mathbf A_{xx}\) in Table 3 is inferred from the third line. Finally, the
fourth summary line corresponds to the test results in Table 4, Model I. A textual indication of the
presence of spurious cointegration is displayed at the bottom of the
"cointARDL" summary, if detected.
In this example, the bootstrap and bound testing procedures are in
agreement only for model I, indicating the existence of a cointegrating
relationship. Additionally, no spurious cointegration is detected for
this model. As for models II and III, the null hypothesis is not
rejected by the bootstrap tests, while the PSS and SMG bound tests fail
to give a conclusive answer in the \(F_{ind}\) test.
The running time of the entire analysis is of roughly 11 minutes, using
an Intel(R) Core(TM) i7-1165G7 CPU @ 2.80GHz processor, 16GB of RAM.
| level variable | first difference | ||||
|---|---|---|---|---|---|
| Series | lag | ADF | p.value | ADF | p-value |
| \(\log\text{C}_t\) | 0 | -1.690 | 0.450 | -9.750 | <0.01 |
| 1 | -1.860 | 0.385 | -5.190 | <0.01 | |
| 2 | -1.420 | 0.549 | -3.130 | 0.030 | |
| 3 | -1.010 | 0.691 | -2.720 | 0.080 | |
| \(\log\text{INC}_t\) | 0 | -2.290 | 0.217 | -11.140 | <0.01 |
| 1 | -1.960 | 0.345 | -7.510 | <0.01 | |
| 2 | -1.490 | 0.524 | -5.120 | <0.01 | |
| 3 | -1.310 | 0.587 | -3.290 | 0.020 | |
| \(\log\text{INV}_t\) | 0 | -1.200 | 0.625 | -8.390 | <0.01 |
| 1 | -1.370 | 0.565 | -5.570 | <0.01 | |
| 2 | -1.360 | 0.570 | -3.300 | 0.020 | |
| 3 | -1.220 | 0.619 | -3.100 | 0.032 |
log-consumption/investment/income graphs (level variables and first differences). Made with ggplot.
| Model I | Model II | Model III | |||||||
|---|---|---|---|---|---|---|---|---|---|
| ARDL | VECM | ARDL | VECM | ARDL | VECM | ||||
| \(\Delta\log\text{C}_t\) | \(\Delta\log\text{INV}_t\) | \(\Delta\log\text{INC}_t\) | \(\Delta\log\text{INC}_t\) | \(\Delta\log\text{C}_t\) | \(\Delta\log\text{INV}_t\) | \(\Delta\log\text{INV}_t\) | \(\Delta\log\text{C}_t\) | \(\Delta\log\text{INC}_t\) | |
| \(\log\text{C}_{t-1}\) | -0.307 *** (0.055) | 0.168 * (0.081) | -0.0011 (0.0126) | 0.1286 * (0.0540) | 0.611 . (0.339) | -0.2727 *** (0.0704) | -0.0508 (0.0796) | ||
| \(\log\text{INC}_{t-1}\) | 0.297 *** (0.055) | 0.124 * (0.054) | -0.017 (0.014) | -0.183 * (0.079) | -0.491 (0.340) | 0.2619 *** (0.0681) | 0.0464 (0.0772) | ||
| \(\log\text{INV}_{t-1}\) | -0.001 (0.011) | -0.152 * (0.063) | 0.016 (0.017) | 0.0209 (0.0135) | -0.00107 (0.0142) | -0.1531 * (0.0607) | -0.1212 * (0.060) | ||
| \(\Delta\log\text{C}_{t-1}\) | -0.248 ** (0.079) | 0.899 * (0.442) | 0.211 . (0.113) | 0.375 *** (0.1086) | 0.9288 * (0.442) | 1.113 * (0.441) | 0.2072 . (0.1142) | ||
| \(\Delta\log\text{C}_{t-2}\) | 0.744 (0.431) | 0.8049 . (0.4345) | |||||||
| \(\Delta\log\text{INC}_{t-1}\) | -0.1404 (0.1095) | ||||||||
| \(\Delta\log\text{INC}_{t-2}\) | 0.2675 ** (0.0958) | 0.1522 . (0.0912) | |||||||
| \(\Delta\log\text{INV}_{t-1}\) | -0.18 (0.111) | 0.035 (0.029) | -0.189 . (0.1097) | -0.175 (0.1075) | 0.0479 . (0.0282) | ||||
| \(\Delta\log\text{INV}_{t-2}\) | 0.049 . (0.027) | 0.0591 * (0.0245) | 0.0578 * (0.0223) | 0.0562 * (0.0266) | |||||
| \(\Delta\log\text{C}_t\) | 0.7070 *** (0.1093) | 1.8540 *** (0.5425) | |||||||
| \(\Delta\log\text{INC}_t\) | 0.471 *** (0.074) | -0.445 *** (0.4726) | |||||||
| \(\Delta\log\text{INV}_t\) | 0.065 ** (0.019) | -0.0230 (0.025) | |||||||
| const. | 0.048 *** (0.013) | 0.036 (0.066) | 0.033 * (0.017) | 0.002 (0.018) | 0.0266 . (0.0155) | 0.023 (0.0666) | -0.056 (0.072) | 0.0517 ** (0.0157) | 0.0378 * (0.0177) |
| J-test | \(rk(\mathbf{A_{xx}})=2\) | \(rk(\mathbf{A_{xx}})=2\) | \(rk(\mathbf{A_{xx}})=2\) |
| PSS / SMG Threshold | Outcome | |||||||
|---|---|---|---|---|---|---|---|---|
| Model | Lags | Test | Boot. Critical Values | I(0) 5% | I(1) 5% | Statistic | Boot | Bound |
| I | (1,0,0) | \(F_{ov}\) | 3.79 | 3.79 | 4.85 | 10.75 | Y | Y |
| \(t\) | -2.88 | -2.86 | -3.53 | -5.608 | ||||
| \(F_{ind}\) | 4.92 | 3.01 | 5.42 | 15.636 | ||||
| II | (1,1,0) | \(F_{ov}\) | 5.79 | 3.79 | 4.85 | 2.867 | N | U |
| \(t\) | -3.69 | -2.86 | -3.53 | -2.315 | ||||
| \(F_{ind}\) | 7.38 | 3.01 | 5.42 | 3.308 | ||||
| III | (1,1,0) | \(F_{ov}\) | 5.50 | 3.79 | 4.85 | 3.013 | N | U |
| \(t\) | -3.32 | -2.86 | -3.53 | -2.020 | ||||
| \(F_{ind}\) | 6.63 | 3.01 | 5.42 | 4.189 |
Following Bertelli, Vacca, and Zoia (2022), the relationship between foreign direct investment \[FDI\], exports \[EXP\], and gross domestic product \[GDP\] in Italy is investigated. The data of these three yearly variables have been retrieved from the World Bank Database and cover the period from 1970 to 2020. In the analysis, the log of the variables has been used and \[EXP\] and \[FDI\] have been adjusted using the GDP deflator. Figure @ref(fig:figplotemp2) displays these series over the sample period.
log-GDP/export/investment graphs (level variables and first differences). Made with ggplot.
Table 5 shows the outcomes of the ADF test performed on each variable, which ensures that the integration order is not higher than one for all variables. Table 6 shows the results of bound and bootstrap tests performed in ARDL model by taking each variable, in turn, as the dependent one. The following ARDL equations have been estimated:
First ARDL equation (GDP | EXP, FDI): \[\begin{align} \Delta \log \text{GDP}_{t}&=\alpha_{0.y} - a_{yy} \log \text{GDP}_{t-1} - {a}_{y.x_1}\log \text{EXP}_{t-1} - {a}_{y.x_2}\log \text{FDI}_{t-1} +\\\nonumber &\sum_{j=1}^{p-1}\gamma_{y.j} \Delta\log \text{GDP}_{t-j} + \sum_{j=1}^{s-1}\gamma_{x_1.j} \Delta\log \text{EXP}_{t-j} + \sum_{j=1}^{r-1}\gamma_{x_2.j} \Delta\log \text{FDI}_{t-j} +\\\nonumber &\omega_1 \Delta\log \text{EXP}_{t}+ \omega_2 \Delta\log \text{FDI}_{t}+\nu_{t}. \end{align}\] For this model, a degenerate case of the first type can be observed, while the simpler bound testing procedure does not signal cointegration.
Second ARDL equation (EXP | GDP, FDI): \[\begin{align} \Delta \log \text{EXP}_{t}&=\alpha_{0.y} - a_{yy} \log \text{EXP}_{t-1} - {a}_{y.x_1}\log \text{GDP}_{t-1} - {a}_{y.x_2}\log \text{FDI}_{t-1} +\\\nonumber &\sum_{j=1}^{p-1}\gamma_{y.j} \Delta\log \text{EXP}_{t-j} + \sum_{j=1}^{s-1}\gamma_{x_1.j} \Delta\log \text{GDP}_{t-j} + \sum_{j=1}^{r-1}\gamma_{x_2.j} \Delta\log \text{FDI}_{t-j} +\\\nonumber &\omega_1 \Delta\log \text{GDP}_{t}+ \omega_2 \Delta\log \text{FDI}_{t}+\nu_{t}. \end{align}\] For this model, the ARDL bootstrap test indicates absence of cointegration, while the bound testing approach is inconclusive for the \(F_{ind}\) test.
Third ARDL equation (FDI | GDP, EXP): \[\begin{align} \Delta \log \text{FDI}_{t}&=\alpha_{0.y} - a_{yy} \log \text{FDI}_{t-1} - {a}_{y.x_1}\log \text{GDP}_{t-1} - {a}_{y.x_2}\log \text{EXP}_{t-1} +\\\nonumber &\sum_{j=1}^{p-1}\gamma_{y.j} \Delta\log \text{FDI}_{t-j} + \sum_{j=1}^{s-1}\gamma_{x_1.j} \Delta\log \text{GDP}_{t-j} + \sum_{j=1}^{r-1}\gamma_{x_2.j} \Delta\log \text{EXP}_{t-j} +\\\nonumber &\omega_1 \Delta\log \text{GDP}_{t}+ \omega_2 \Delta\log \text{EXP}_{t}+\nu_{t}. \end{align}\] For this model, the long-run cointegrating relationship is confirmed using both boostrap and bound testing. No spurious cointegration is detected.
The code to load the data and perform the analysis (e.g. for Model I) is:
data("ita_macro")
BCT_res_GDP = boot_ardl(data = ita_macro,
yvar = "LGDP",
xvar = c("LEXP", "LFI"),
maxlag = 5,
a.ardl = 0.1,
a.vecm = 0.1,
nboot = 2000,
case = 3,
a.boot.H0 = c(0.05),
print = T)
For the sake of simplicity, the conditional ARDL and VECM marginal models outputs included in each cointegrating analysis is omitted. The summary for the cointegration tests for Model I is called via
summary(BCT_res_GDP, out = "ARDL") # extract lags
summary(BCT_res_GDP, out ="cointARDL") # ARDL cointegration
This empirical application further highlights the importance of dealing with inconclusive inference via the bootstrap procedure, while naturally including the effect of conditioning in the ARDL model, as highlighted in Bertelli, Vacca, and Zoia (2022).
| No Drift, No Trend | Drift, No Trend | Drift and Trend | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Variable | Lag = 0 | Lag = 1 | Lag = 2 | Lag = 3 | Lag = 0 | Lag = 1 | Lag = 2 | Lag = 3 | Lag = 0 | Lag = 1 | Lag = 2 | Lag = 3 |
| \(\log \text{GDP}_t\) | 0.99 | 0.974 | 0.941 | 0.796 | <0.01 | <0.01 | <0.01 | 0.084 | 0.99 | 0.99 | 0.99 | 0.99 |
| \(\log \text{FDI}_t\) | 0.572 | 0.599 | 0.675 | 0.725 | <0.01 | 0.0759 | 0.3199 | 0.5174 | <0.01 | 0.013 | 0.151 | 0.46 |
| \(\log \text{EXP}_t\) | 0.787 | 0.71 | 0.698 | 0.684 | 0.479 | 0.288 | 0.467 | 0.433 | 0.629 | 0.35 | 0.463 | 0.379 |
| \(\Delta\log \text{GDP}_t\) | <0.01 | <0.0164 | 0.0429 | 0.0402 | <0.01 | 0.0861 | 0.3989 | 0.4267 | <0.01 | <0.01 | 0.0166 | 0.017 |
| \(\Delta\log \text{FDI}_t\) | <0.01 | <0.01 | <0.01 | <0.01 | <0.01 | <0.01 | <0.01 | <0.01 | <0.01 | <0.01 | <0.01 | <0.01 |
| \(\Delta\log \text{EXP}_t\) | <0.01 | <0.01 | <0.01 | <0.01 | <0.01 | <0.01 | <0.01 | <0.01 | <0.01 | <0.01 | 0.0336 | 0.0315 |
| PSS / SMG Threshold | Outcome | |||||||
|---|---|---|---|---|---|---|---|---|
| Model | Lags | Test | Boot. Critical Values | I(0) 5% | I(1) 5% | Statistic | Boot | Bound |
| I | (1,1,0) | \(F_{ov}\) | 3.730 | 4.070 | 5.190 | 9.758 | D1 | N |
| \(t\) | -2.020 | -2.860 | -3.530 | -2.338 | ||||
| \(F_{ind}\) | 3.710 | 3.220 | 5.620 | 2.273 | ||||
| II | (1,0,0) | \(F_{ov}\) | 5.400 | 4.070 | 5.190 | 2.649 | N | U |
| \(t\) | -3.380 | -2.860 | -3.530 | -1.889 | ||||
| \(F_{ind}\) | 5.630 | 3.220 | 5.620 | 3.481 | ||||
| III | (1,0,0) | \(F_{ov}\) | 5.360 | 4.070 | 5.190 | 6.716 | Y | Y |
| \(t\) | -3.550 | -2.860 | -3.530 | -4.202 | ||||
| \(F_{ind}\) | 6.500 | 3.220 | 5.620 | 7.017 |
The bootCT
package allows the user to perform bootstrap cointegration tests in ARDL
models by overcoming the problem of inconclusive inference which is a
well-known drawback of standard bound tests. The package makes use of
different functions. The function boot_ardl performs the
bootstrap tests, and it acts as a wrapper of both the bootstrap and the
standard bound tests, including also the Johansen test on the
independent variables of the model. Finally, it also performs the bound
\(F\)-test on the lagged independent
variables, so far not available in other extant R packages.
The function sim_vecm_ardl, which allows the simulation of
multivariate time series data following a user-defined DGP, enriches the
available procedures for multivariate data generation, while the
function lag_mts provides a supporting tool in building
datasets of lagged variables for any practical purpose. Finally, the use
of Rcpp functions gives a technical advantage in terms of computational
speed, performing the bootstrap analysis within an acceptable time
frame.
Expanding the matrix polynomial \(\mathbf{A}(z)\) about \(z=1\), yields \[ \mathbf{A}(z)=\mathbf{A}(1)z+(1-z)\boldsymbol{\Gamma}(z), (\#eq:polyamat)\] where \[\mathbf{A}(1)=\mathbf{I}_{K+1}-\sum_{j=1}^{p}\mathbf{A}_{j}\]
\[ \boldsymbol{\Gamma}(z)=\mathbf{I}_{K+1}-\sum_{i=1}^{p-1}\boldsymbol{\Gamma}_{i}z^i, \enspace \enspace \boldsymbol{\Gamma}_{i}=-\sum_{j=i+1}^{p}\mathbf{A}_j. (\#eq:polygamma)\] The VECM model @ref(eq:vecm) follows accordingly, and
\[
\boldsymbol{\alpha}_0=\mathbf{A}(1)\boldsymbol{\mu}+(\boldsymbol{\Gamma}(1)-\mathbf{A}(1))\boldsymbol{\eta},
\enspace \enspace \enspace
\boldsymbol{\alpha}_1=\mathbf{A}(1)\boldsymbol{\eta}. (\#eq:vecmint)\]
Assuming that \(\mathbf{A}(1)\) is
singular and that the variables \(\mathbf{x}_{t}\) are cointegrated. This
entails the following \[\begin{align}
\mathbf{A}(1)=&\begin{bmatrix}
\underset{(1,1)}{a_{yy}} & \underset{(1,K)}{\mathbf{a}_{yx}'} \\
\underset{(K,1)}{\mathbf{a}_{xy}} &
\underset{(K,K)}{\mathbf{A}_{xx}}
\end{bmatrix}=\underset{(K+1,r+1)}{\mathbf{B}}\underset{(r+1,K+1)}{\mathbf{C}'}=\begin{bmatrix}b_{yy}
& \mathbf{b}_{yx}'\\ \mathbf{b}_{xy} & \mathbf{B}_{xx}
\end{bmatrix}\begin{bmatrix}c_{yy} &
\mathbf{c}_{yx}'\\ \mathbf{c}_{xy} &
\mathbf{C}_{xx}'\end{bmatrix}= \nonumber\\
=&\begin{bmatrix}b_{yy}c_{yy}+\mathbf{b}_{yx}'\mathbf{c}_{xy}
&
b_{yy}\mathbf{c}_{yx}'+\mathbf{b}_{yx}'\mathbf{C}_{xx}'\\
\mathbf{b}_{xy}c_{yy}+\mathbf{B}_{xx}\mathbf{c}_{xy} &
\mathbf{b}_{xy}\mathbf{c}_{yx}'+ \mathbf{A}_{xx} \end{bmatrix},
\enspace \enspace \enspace
rk(\mathbf{A}(1))=rk(\mathbf{B})=rk(\mathbf{C}),(\#eq:factt)
\end{align}\]
where \(\mathbf{B}\) and \(\mathbf{C}\) are full column rank matrices
arising from the rank-factorization of \(\mathbf{A}(1)=\mathbf{B}\mathbf{C}'\)
with \(\mathbf{C}\) matrix of the
long-run relationships of the process and \(\mathbf{B}_{xx}\), \(\mathbf{C}_{xx}\) arising from the rank
factorization of \(\mathbf{A}_{xx}=\mathbf{B}_{xx}\mathbf{C}_{xx}'\),
with \(rk(\mathbf{A}_{xx})=rk(\mathbf{B}_{xx})=rk(\mathbf{C}_{xx})=r\)
6.
By partitioning the vectors \(\boldsymbol{\alpha}_{0}\), \(\boldsymbol{\alpha}_{1}\), the matrix \(\mathbf{A}(1)\) and the polynomial matrix
\(\boldsymbol{\Gamma}(L)\) conformably
to \(\mathbf{z}_{t}\), as follows
\[ \boldsymbol{\alpha}_0=\begin{bmatrix} \underset{(1,1)}{\alpha_{0y}} \\ \underset{(K,1)}{\boldsymbol{\alpha}_{0x}} \end{bmatrix}, \enspace \enspace \enspace \boldsymbol{\alpha}_1=\begin{bmatrix} \underset{(1,1)}{\alpha_{1y}} \\ \underset{(K,1)}{\boldsymbol{\alpha}_{1x} } \end{bmatrix} (\#eq:alphapart)\]
\[ \mathbf{A}(1)=\begin{bmatrix} \underset{(1,K+1)}{\mathbf{a}'_{(y)}} \\ \underset{(K,K+1)}{\mathbf{A}_{(x)}} \end{bmatrix} =\begin{bmatrix} \underset{(1,1)}{a_{yy}} & \underset{(1,K)}{\mathbf{a}'_{yx}} \\ \underset{(K,1)}{\mathbf{a}_{xy}} & \underset{(K,K)}{\mathbf{A}_{xx} } \end{bmatrix}, \enspace \enspace \enspace \boldsymbol{\Gamma}(L)=\begin{bmatrix} \underset{(1,K+1)}{\boldsymbol{\gamma}'_{y}(L)} \\ \underset{(K,K+1)}{\boldsymbol{\Gamma}_{(x)}(L)} \end{bmatrix} =\begin{bmatrix} \underset{(1,1)}{\gamma_{yy}(L)} & \underset{(1,K)}{\boldsymbol{\gamma}'_{yx}(L)} \\ \underset{(K,1)}{\boldsymbol{\gamma}_{xy}(L)} & \underset{(K,K)}{\boldsymbol{\Gamma}_{xx}(L) } \end{bmatrix} (\#eq:coeffpart)\] , and substituting @ref(eq:epsilonx) into @ref(eq:vecm) yields
\[ \Delta\mathbf{z}_t=\begin{bmatrix} \Delta y_{t} \\ \Delta\mathbf{x}_{t} \end{bmatrix}=\begin{bmatrix} \alpha_{0.y} \\ \boldsymbol{\alpha}_{0x} \end{bmatrix} + \begin{bmatrix} \alpha_{1.y} \\ \boldsymbol{\alpha}_{1x} \end{bmatrix}t- \begin{bmatrix} \mathbf{a}'_{(y).x} \\ \mathbf{A}_{(x)} \end{bmatrix}\begin{bmatrix} y_{t-1} \\ \mathbf{x}_{t-1} \end{bmatrix} + \begin{bmatrix} \boldsymbol{\gamma}'_{y.x}(L) \\ \boldsymbol{\Gamma}_{(x)}(L) \end{bmatrix}\Delta\mathbf{z}_t+\begin{bmatrix} \boldsymbol{\omega}'\Delta\mathbf{x}_{t} \\ \mathbf{0} \end{bmatrix}+\begin{bmatrix} {\nu}_{yt} \\ \boldsymbol{\varepsilon}_{xt} \end{bmatrix} (\#eq:condsys)\] , where
\[ \alpha_{0.y}=\alpha_{0y}-\boldsymbol{\omega}'\boldsymbol{\alpha}_{0x}, \enspace \enspace \enspace \alpha_{1.y}=\alpha_{1y}-\boldsymbol{\omega}'\boldsymbol{\alpha}_{1x} (\#eq:condintt)\]
\[ \mathbf{a}'_{(y).x}=\mathbf{a}'_{(y)}-\boldsymbol{\omega}'\mathbf{A}_{(x)}, \enspace \enspace \enspace \boldsymbol{\gamma}'_{y.x}(L)=\boldsymbol{\gamma}_{y}'(L)-\boldsymbol{\omega}'\boldsymbol{\Gamma}_{(x)}(L). (\#eq:condAmat)\]
According to @ref(eq:condsys), the long-run relationships of the VECM turn out to be now included in the matrix
\[ \begin{bmatrix} \mathbf{a}'_{(y).x} \\ \mathbf{A}_{(x)} \end{bmatrix}=\begin{bmatrix} a_{yy}-\boldsymbol{\omega}'\mathbf{a}_{xy} & \mathbf{a}_{yx}'-\boldsymbol{\omega}'\mathbf{A}_{xx} \\ \mathbf{a}_{xy}&\mathbf{A}_{xx} \end{bmatrix}. (\#eq:condAmat2)\]
To rule out the presence of long-run relationships between \(y_{t}\) and \(\mathbf{x}_{t}\) in the marginal model, the \(\mathbf{x}_{t}\) variables are assumed to be exogenous with respect to the ARDL parameters, that is \(\mathbf{a}_{xy}\) is assumed to be a null vector. Accordingly, the long-run matrix in @ref(eq:condAmat2) becomes
\[ \widetilde{\mathbf{A}}=\begin{bmatrix}a_{yy} & \mathbf{a}'_{yx}-\boldsymbol{\omega}'\mathbf{A}_{xx} \\ \mathbf{0} & \mathbf{A}_{xx} \end{bmatrix}=\begin{bmatrix} a_{yy} & \widetilde{\mathbf{a}}_{y.x}' \\ \mathbf{0}&\mathbf{A}_{xx}\end{bmatrix} =\begin{bmatrix} b_{yy}c_{yy} & b_{yy}\mathbf c_{yx}'+(\mathbf{b}_{yx}'-\boldsymbol{\omega}'\mathbf{B}_{xx})\mathbf{C}_{xx}' \\ \mathbf{0}& \mathbf{B}_{xx}\mathbf{C}_{xx}'\end{bmatrix}. (\#eq:cond)\]
After these algebraic transformations, the ARDL equation for \(\Delta y_{t}\) can be rewritten as in
@ref(eq:ardl).
In light of the factorization @ref(eq:factt) of the matrix \(\mathbf{A}(1)\), the long-run equilibrium
vector \(\boldsymbol{\theta}\) can be
expressed as
\[ \boldsymbol{\theta}'= -\frac{1}{a_{yy}}\underset{(1,r+1)}{\left[b_{yy}\enspace\enspace(\mathbf{b}_{yx}-\boldsymbol{\omega}'\mathbf{B}_{xx})\right]} \underset{(r+1,K)}{\begin{bmatrix} \mathbf{c}'_{yx}\\ \mathbf{C}'_{xx} \end{bmatrix}}, (\#eq:thetat)\]
where
\(\widetilde{\mathbf{a}}_{y.x}=\mathbf{a}_{yx}-\boldsymbol{\omega}'\mathbf{A}_{xx}\).
Bearing in mind that \(\mathbf{C}'_{xx}\) is the cointegrating
matrix for the variables \(\mathbf{x}_t\), the equation
@ref(eq:thetat) leads to the following conclusion
\[
rk\begin{bmatrix}\mathbf{c}'_{yx}\\
\mathbf{C}'_{xx}\end{bmatrix}=\begin{cases}
r \to \enspace y_{t} \sim I(0) \\
r+1 \to \enspace y_{t} \sim I(1)
\end{cases}, (\#eq:rank)\] where \(r=rk(\mathbf{A}_{xx})\) and \(0 \leq r\leq K\).
Pesaran, Shin, and Smith (2001)
introduced five different specifications for the ARDL model, which
depend on the deterministic components that can be absent or restricted
to the values they assume in the parent VAR model. In this connection,
note that, in light of @ref(eq:vecmint), the drift and the trend
coefficient in the conditional VECM @ref(eq:condsys) are defined as
\[\boldsymbol{\alpha_{0}}^{c}=\widetilde{\mathbf{A}}(1)(\boldsymbol{\mu}-\boldsymbol{\eta})+\widetilde{\boldsymbol{\Gamma}}(1)\boldsymbol{\eta}
, \enspace \enspace
\boldsymbol{\alpha_{1}}^{c}=\widetilde{\mathbf{A}}(1)\boldsymbol{\eta},\]
where \(\widetilde{\mathbf{A}}(1)\) is
as in @ref(eq:cond) and \(\widetilde{\boldsymbol{\Gamma}}(1)=\begin{bmatrix}
\boldsymbol{\gamma}_{y.x}'(1) \\ \boldsymbol{\Gamma}_{(x)}(1)
\end{bmatrix}\).
Accordingly, after partitioning the mean and the drift vectors as \[\underset{(1,K+1)}{\boldsymbol{\mu}'}=[\underset{(1,1)}{\mu_{y}},\underset{(1,K)}{\boldsymbol{\mu}_x'}],
\enspace
\underset{(1,K+1)}{\boldsymbol{\eta}'}=[\underset{(1,1)}{\eta_{y}},\underset{(1,K)}{\boldsymbol{\eta}_{x}'}],\]
the intercept and the coefficient of the trend of the ARDL equation
@ref(eq:ardl) are defined as \[\alpha_{0.y}^{EC}
= \mathbf{e}_{1}'\boldsymbol{\alpha_{0}}^{c}
=a_{yy}\mu_{y}-\widetilde{\mathbf{a}}'_{y.x}\boldsymbol{\mu}_{x}+\boldsymbol{\gamma}'_{y.x}(1)\boldsymbol{\eta}=a_{yy}(\mu_{y}-\boldsymbol{\theta}'\boldsymbol{\mu}_{x})+\boldsymbol{\gamma}'_{y.x}(1)\boldsymbol{\eta},
\enspace
\boldsymbol{\theta}'=-\frac{\widetilde{\mathbf{a}}'_{y.x}}{a_{yy}}\]
\[\enspace \enspace
\alpha_{1.y}^{EC}=\mathbf{e}_{1}'\boldsymbol{\alpha_{1}}^{c}=
a_{yy}\eta_{y}-\widetilde{\mathbf{a}}'_{y.x}\boldsymbol{\eta}_{x}=a_{yy}(\eta_{y}-\boldsymbol{\theta'}\boldsymbol{\eta}_{x}),\]
where \(\mathbf{e}_{1}\) is the \(K+1\) first elementary vector.
In the error correction term \[EC_{t-1}=y_{t-1}-\theta_{0}-\theta_{1}t-\boldsymbol{\theta}'\mathbf{x}_{t-1}\]
the parameters that partake in the calculation of intercept and trend
are \[\theta_{0}=\mu_{y}-\boldsymbol{\theta}'\boldsymbol{\mu}_{x},
\enspace \theta_{1}=\eta_{y}-\boldsymbol{\theta}'\boldsymbol{\eta}_{x}.\]
In particular, these latter are not null only when they are assumed to
be restricted in the model specification.
The five specifications proposed by Pesaran,
Shin, and Smith (2001) are
No intercept and no trend: \[\boldsymbol{\mu}=\boldsymbol{\eta}=\mathbf{0}.\] It follows that \[\theta_{0}=\theta_{1}=\alpha_{0.y}=\alpha_{1.y}=0.\] Accordingly, the model is as in @ref(eq:case1).
Restricted intercept and no trend: \[\boldsymbol{\alpha}_{0}^{c}= \widetilde{\mathbf{A}}(1)\boldsymbol{\mu},\enspace \enspace \boldsymbol{\eta}=\mathbf{0},\] which entails \[\theta_0 \neq 0 \enspace\enspace\alpha_{0.y}^{EC}=a_{yy}\theta_{0}, \enspace \enspace \alpha_{0.y}=\theta_{1}=\alpha_{1.y}=0.\] Therefore, the intercept stems from the EC term of the ARDL equation. The model is specified as in @ref(eq:case2)
Unrestricted intercept and no trend: \[\boldsymbol{\alpha}_{0}^{c}\neq\widetilde{\mathbf{A}}(1)\boldsymbol{\mu}, \enspace \enspace \boldsymbol{\eta}=\mathbf{0}.\] Thus, \[\alpha_{0.y}\neq 0,\enspace \enspace \theta_{0}=\theta_{1}=\alpha_{1.y}=0.\] Accordingly, the model is as in @ref(eq:case3).
Unrestricted intercept, restricted trend: \[\boldsymbol{\alpha_{0}}^{c}\neq\widetilde{\mathbf{A}}(1)(\boldsymbol{\mu}-\boldsymbol{\eta})+\widetilde{\boldsymbol{\Gamma}}(1)\boldsymbol{\eta}\enspace \enspace {\boldsymbol{\alpha}}_{1}^{c}=\widetilde{\mathbf{A}}(1)\boldsymbol{\eta},\] which entails \[\alpha_{0.y} \neq 0,\enspace \enspace \theta_{0}=0 \enspace \enspace \theta_{1}\neq 0\enspace\enspace \alpha_{1.y}^{EC}=a_{yy}\theta_1\enspace\enspace \alpha_{1.y}=0.\] Accordingly, the trend stems from the EC term of the ARDL equation. The model is as in @ref(eq:case4).
Unrestricted intercept, unrestricted trend: \[\boldsymbol{\alpha_{0}}^{c}\neq\widetilde{\mathbf{A}}(1)(\boldsymbol{\mu}-\boldsymbol{\eta})+\widetilde{\boldsymbol{\Gamma}}(1)\boldsymbol{\eta} \enspace \enspace {\boldsymbol{\alpha}}_{1}^{c}\neq\widetilde{\mathbf{A}}(1)\boldsymbol{\eta}.\] Accordingly, \[\alpha_{0.y} \neq 0 \enspace \enspace\alpha_{1.y} \neq 0, \enspace \enspace\theta_{0}=\theta_{1}=0.\] The model is as in @ref(eq:case5).
The R packages, either used in the creation
of bootCT
or employed in the analyses presented in this paper, are magrittr
(Bache and Wickham 2022), gtools
(Bolker, Warnes, and Lumley 2022), pracma
(Borchers 2022), Rcpp
(Eddelbuettel 2013), RcppArmadillo
(Eddelbuettel et al. 2023), Rmisc
(Hope 2022), dynamac
(Jordan and Philips 2020), ARDL
(Natsiopoulos and Tzeremes 2021), aod
(Lesnoff et al. 2012), vars
and urca
(Bernhard Pfaff 2008; B. Pfaff 2008), aTSA
(Qiu 2015), tseries
(Trapletti and Hornik 2023), reshape2,
ggplot2
and stringr
(Wickham 2007, 2016, 2022), tidyverse
and dplyr
(Wickham et al. 2019, 2023).↩︎
If the explanatory variables are stationary \(\mathbf{A}_{xx}\) is non-singular (\(rk(\mathbf{A}_{xx})=K\)), while when they are integrated but without cointegrating relationship \(\mathbf{A}_{xx}\) is a null matrix.↩︎
The knowledge of the rank of the cointegrating matrix is necessary to overcome this impasse.↩︎
The latter is introduced in the ARDL equation by the operation of conditioning \(y_t\) on the other variables \(\mathbf{x}_t\) of the model↩︎
In fact, as \(\boldsymbol{\omega}'\mathbf{A}_{xx}\mathbf{x}_{t} \approx I(0)\), the conclusion that \(y_{t}\approx I(0)\) must hold. This in turn entails that no cointegration occurs between \(y_t\) and \(\mathbf{x}_{t}\).↩︎
If the explanatory variables are stationary \(\mathbf{A}_{xx}\) is non-singular (\(rk(\mathbf{A}_{xx})=K\)), while when they are integrated but without cointegrating relationship \(\mathbf{A}_{xx}\) is a null matrix↩︎