GenMarkov: Modeling Generalized Multivariate Markov Chains in R

Carolina Vasconcelos

NOVA Information Management School (NOVA IMS)
cvasconcelos@novaims.unl.pt

Bruno Damásio

NOVA Information Management School (NOVA IMS)
bdamasio@novaims.unl.pt

2025-01-10

Abstract

This article proposes a new generalization of the Multivariate Markov Chains (MMC) model. The future values of a Markov chain commonly depend on only the past values of the chain in an autoregressive fashion. The generalization proposed in this work also considers exogenous variables that can be deterministic or stochastic. Furthermore, the effects of the MMC’s past values and the effects of pre-determined or exogenous covariates are considered in our model by considering a non-homogeneous Markov chain. The Monte Carlo simulation study findings showed that our model consistently detected a non-homogeneous Markov chain. Besides, an empirical illustration demonstrated the relevance of this new model by estimating probability transition matrices over the space state of the exogenous variable. An additional and practical contribution of this work is the development of a novel R package with this generalization.

Introduction

Multivariate Markov chains (MMC) have a wide range of applications, in various fields. Hence, several studies and generalizations of the MMC models have been made. However, the availability of packages that allow the estimation and application of these models are scarce, and most of these methods use algorithms and software that are not broadly available or can only be applied in particular situations. In the last few years, R software has been gaining importance in the field of statistical computing. This phenomenon might be because it is free and open-source software, which compiles and runs on a wide variety of operating systems. Specifically, in R software, there are some available packages related to Markov chains (MC) and MMC. For example, the package (Maitre and Emery 2020; Berchtold, Maitre, and Emery 2020) allows the computation of various Markovian models for categorical data, including homogeneous Markov chains of any order, MTD models, Hidden Markov models, and Double Chain Markov Models. Ogier Maitre developed this package with contributions from Andre Berchtold, Kevin Emery, Oliver Buschor, and Andre Berchtold maintains it. All the models computed by this package are for univariate categorical data. The package (Spedicato 2017) contains functions and methods to create and manage discrete-time Markov chains. In addition, it includes functions to perform statistical and probabilistic analysis (analysis of their structural proprieties). Finally, the package (Nicholson 2013) contains a series of functions that aid in both simulating and determining the properties of finite, discrete-time, discrete-state Markov chains. There are two main functions: DTMC and MultDTMC, which produce $n$ iterations of a Markov Chain(s) based on transition probabilities and an initial distribution given by the user, for the univariate and multivariate case, respectively. This last package is the only one available in R for MMC. In general, the work on MMC models is mostly based on improving the estimation methods and/or making the model more parsimonious. In this work, we aim to develop a new generalization that considers exogenous variables. Specifically, the effects of the MMC’s past values and the effects of pre-determined or exogenous covariates are considered in our model by considering a non-homogeneous Markov chain. Additionally, we address statistical inference and implement these methods in an R package. The R package includes three functions: multimtd, multimtd_probit and mmcx. The first two functions estimate the MTD model for multivariate categorical data, with Chings’s specification (Ching, Fung, and Ng 2002) and with the Probit specification (Nicolau 2014), respectively. The last function allows the estimation of our proposed model, the Generalized Multivariate Markov Chain (GMMC) model. The R package, , with these three functions is available in the Comprehensive R Archive Network (CRAN) at https://CRAN.R-project.org/package=GenMarkov.

Multivariate Markov chains

Markov chains can be appropriate for representing dependencies between successive observations of a random variable. However, when the order of the chain or the number of possible values increases, Markov chains have lack parsimony. In this context, Jacobs and Lewis (1978), Pegram (1980) and Logan (1981) proposed several models for HOMC. Notwithstanding these developments, the Mixture Transition Distribution model (Raftery 1985) proved to be more suitable to model HOMC, which overshadowed the previously proposed models. Several relevant extensions of the MTD model emerged: the Multimatrix MTD (Berchtold 1995, 1996), which allowed modeling the MTD by using a different $m \times m$ transition matrix for each lag, the Infinite-Lag MTD model that assumes an infinite lag order ($l = \infty$), which was first considered by Mehran (1989) and later developed by Le, Martin, and Raftery (1996) in a more general context. Finally, the MTD with General State Spaces allowed modeling more general processes with an arbitrary space state (Martin and Raftery 1987; Adke and Deshmukh 1988; Wong and Li 2001). Although the MTD model presents a more parsimonious approach to model Markov chains with order higher than one, it has weaknesses. Namely, when considering more than one data sequence, one represents the MMC as a HOMC, by expanding the state-space. This approach could result in a more complex probability transition matrix. Consequently, this can make the estimation unfeasible as the order, states, and the number of data sequences increase. Additionally, the model assumes the same transition matrix for each lag. In this setting, Ching, Fung, and Ng (2002) determined an alternative to handle the unfeasibility of the conventional multivariate Markov chain (MMC) by proposing a model with fewer parameters. The model developed is essentially the same as the MTD. However, it considers a different $m \times m$ transition matrix for each lag and considers more than one data sequence. In the proposed multivariate Markov chain model, Ching, Fung, and Ng (2002) assume the following relationship:

Let $x_t^{(j)}$ be the state vector of the $j$th sequence at time $t$. If the $j$th sequence is in state $l$ at time $t$ then

\[\begin{equation} x_{t+1}^{(j)} = \sum_{k=1}^s \lambda_{jk}P^{(jk)}x_{t}^{(k)}, \text{for } j =1, 2, \dots, s (\#eq:eq1) \end{equation}\] where $0 \leq \lambda_{jk} \leq 1$ for $j \leq s, k \leq s$ and $\sum_{k=1}^s \lambda_{jk} =1$ for $j=1, 2, \dots, s$. The $\lambda_{jk}$ can be interpreted as the mixing probability of the $j$th state to the $k$th state.

The state probability distribution of the $k$th sequence at time $(t + 1)$ depends on the weighted average of $P^{(jk)}x_{t}^{(k)}$ . Here $P^{(jk)}$ is a transition probability matrix from the states in the $k$th sequence to the states in the $j$th sequence and $x_t^{(k)}$ is the state probability distribution of the $k$th sequences at time $t$. In matrix form:

\[\begin{equation} \underline{x}_{t+1}^{(j)} \equiv \left[ \begin{array}{c} x_{t+1}^{(1)} \\ \vdots \\ x_{t+1}^{(s)} \end{array} \right ] = \left[ \begin{array}{ccc} \lambda_{11}P^{(11)} & \dots & \lambda_{1s}P^{(1s)}\\ \vdots & \ddots & \vdots\\ \lambda_{s1}P^{(s1)}& \dots & \lambda_{ss}P^{(ss)} \end{array} \right ] \left[ \begin{array}{c} x_{t}^{(1)} \\ \vdots \\ x_{t}^{(s)} \end{array} \right ] \equiv Q \underline{x}_{t} (\#eq:eq2) \end{equation}\] where $Q$ is an $ms \times ms$ block matrix ($s \times s$ blocks of $m \times m$ matrices) and $x_t$ is a stacked $ms$ column vector ($s$ vectors, each one with $m$ rows).

The matrices $P^{(jk)}$ can be estimated for each data sequence by counting the transition frequency from the states in the $k$th sequence to those in the $j$th sequence, obtaining the transition frequency matrix for the data sequence. After normalization, the estimates of the transition probability matrices, i.e., $\widehat{P}^{(jk)}$, are obtained. Regarding the $\lambda_{jk}$ coefficients, the estimation method proposed by Ching, Fung, and Ng (2002) involves the following optimization problem:

\[\begin{equation} min_{\lambda} max_{i} \vert [ \sum_{k=1}^m \lambda_{jk} \widehat{P}^{(jk)} \widehat{\boldsymbol{x}}^{(k)} - \widehat{\boldsymbol{x}}^{(j)} ] \vert (\#eq:eq3) \end{equation}\]

\[ \text{s.t. } \sum_{k=1}^s \lambda_{jk} \text{ and } \lambda_{jk} \geq 0 \] Besides this, different models have been proposed for multiple categorical data sequences. Kijima, Komoribayashi, and Suzuki (2002) proposed a parsimonious MMC model to simulate correlated credit risks. Siu et al. (2005) proposed an easy to implement model; however, its applicability was limited by the number of parameters involved. Ching, Ng, and Fung (2008) proposed a simplified model based on an assumption proposed in Zhang, King, and Hyndman (2006). Zhu and Ching (2010) proposed a method of estimation based on minimizing the prediction error with equality and inequality restrictions and Nicolau and Riedlinger (2014) proposed a new approach to estimate MMC which avoids imposing restrictions on the parameters, based on non-linear least squares estimation, facilitating the model estimation and the statistical inference. Berchtold (2003) proposed a MTD model for heteroscedastic time series. Lastly, Wang, Huang, and Ching (2014) proposed a new multivariate Markov chain model to reduce the number of parameters. Thus, generally, the models used in the published papers were developed by Ching, Fung, and Ng (2002) or were a consequent generalization of them and addressed the MMC as an end in itself. In Damásio (2013) and Damásio and Nicolau (2014), a different and innovative concept was proposed: the usage of MMC as regressors in a certain model. Hence, given that the MMC Granger causes a specific dependent variable, and taking advantage of the information about the past state interactions between the MMC categories, it was possible to forecast the current dependent variable more accurately. Other relevant contributions are related to the optimization algorithm, as in Lèbre and Bourguignon (2008) and Chen and Lio (2009), and to empirical applications (Ching, Fung, and Ng 2003; Ching and Ng 2006; Damásio 2018; Damásio and Mendonça 2019, 2020). Also, Damásio and Nicolau (2020) proposed a new methodology for detecting and testing the presence multiple structural breaks in a Markov chain occurring at unknown dates. In the vast majority of MMC models’ studies, a positive correlation between the different data sequences is assumed due to the restrictions imposed. This aspect means it is always considered that at moment $t$, an increase in a state probability for a data sequence has an increasing impact on another data sequence, for time $t+1$. Thereupon, if one has a negative correlation between series, the parameter estimates are forced to be zero. The solution to this problem is very straightforward; one can relax the assumptions and not assume the constraints. However, that means the results produced by the model will no longer be probabilities. Raftery and Tavaré (1994) presented an alternative, by dropping the positivity condition and imposing another set of restrictions. Ching, Ng, and Fung (2008) also tackled this issue and proposed a method where one splits the $Q$ matrix into the sum of two other matrices and one represents the positive correlations and another the negative correlations. Also, in Nicolau (2014), a specification completely free from constraints, inspired by the MTD model, was proposed, facilitating the estimation procedure and, at the same time, providing a more accurate specification for $P_j(i_0 | i_1, \dots, i_s)$. The model was:

\[\begin{equation} P_j(i_0 | i_1, \dots, i_s) = P_j^{\Phi}(i_0 | i_1, \dots, i_s) := \\ \frac{\Phi(\eta_{j0} + \eta_{j1}P(i_0|i_1) + \dots + \eta_{js}P(i_0|i_s))}{\sum_{k=1}^m \Phi(\eta_{j0} + \eta_{j1}P(k|i_1) + \dots + \eta_{js}P(k|i_s))} (\#eq:eq4) \end{equation}\] where $n_{ji} \in \mathbb{R}(j = 1, \dots, s; i = 1, \dots, m)$ and $\Phi$ is the (cumulative) standard normal distribution function.

This specification is denoted as and MTD-Probit model. The log-likelihood is given by: \[\begin{equation} LL = \sum_{i_1, i_2, \dots, i_{i_s}, i_0} n_{i_1, i_2, \dots, i_{i_s}, i_0} log(P_j^{\Phi}(i_0 | i_1, \dots, i_s) ) (\#eq:eq5) \end{equation}\] and the maximum likelihood estimator is defined, as usual, as $\widehat{\eta} = \text{arg max}_{n_{j1}, \dots, n_{js}} LL$. The parameters $P_{jk}(i_0|i_1)$, $k$ =$1, \dots, s$ can be estimated in advance, through the consistent and unbiased estimators proposed by Ching, Fung, and Ng (2002):

\[\begin{equation} \widehat{P}_{jk}(i_0|i_1) = \frac{n_{i_1i_0}}{\sum_{i_0=1}^n n_{i_1 i_0}} (\#eq:eq6) \end{equation}\] This specification can be superior to the MTD because the estimation procedure is easier, and the standard numerical optimization routines can be easily applied in the absence of constraints. However, similarly to the standard MTD, the likelihood is not a strictly concave function on the entire parameter state-space, thus the choice of starting values is still important. Additionally, the model describes a broader range of possible dependencies since the parameters are not constrained. Moreover, this proposed model is more accurate than the MTD model. For more details on this, see Nicolau (2014).

Overall, the published work on MMC models was mostly based on improving the estimation methods and/or making the model more parsimonious. In Damásio (2013) and Damásio and Nicolau (2014), a different approach was used, and the work developed focused on the usage of MMC as regressors in a certain model. Notably, it showed that an MMC can improve the forecast of a dependent variable. In a way, it demonstrated that an MMC can be an end in itself, but it can be an instrument to reach an end or a purpose. In this work, the opposite will be developed: instead of considering an MMC as regressors, a model in which a vector with pre-determined exogenous variables is part of $\mathcal{F}_{t-1}$ is proposed.

Covariates in Markov chain models

Regarding the inclusion of covariates in Markov chains models, Regier (1968) proposed a two-state Markov chain model, where the transition matrix probabilities were a function of a parameter, $q$, that described the tendency of the subject to move from state to state. Kalbfleisch and Lawless (1985) proposed a panel data analysis method under a continuous-time Markov model that could be generalized to handle covariate analysis and the fitting of certain non-homogeneous models. This work overcame the limitations of Bartholomew (1968), Spilerman and Singer (1976) and Wasserman (1980) methodologies, by developing a new algorithm that provided a very efficient way of obtaining maximum likelihood estimates. Also, Muenz and Rubinstein (1985) developed a Markov model for covariates dependence of binary sequences, where the transitions probabilities were estimated through two logistic regressions that depended on a set of covariates. Essentially, Muenz and Rubinstein (1985) modeled a non-homogeneous Markov chain through logistic regression, considering only two states. Islam, Arabia, and Chowdhury (2004) developed an extension of this model considering three states, and Islam and Chowdhury (2006) generalized this approach for HOMC. Additionally, Azzalini (1994) proposed a model to study the influence of time-dependent covariates on the marginal distribution of a binary response in serially correlated binary data, where Markov chains are expressed in terms of transitional probabilities. Jackson (2011) proposed a Markov model for panel data, which allowed for the transitions intensities to vary between individuals or constant time-dependent covariates. Specifically, this work allowed to account for different intensities throughout transitions of states and include individual-specific covariates. The time-inhomogeneos model proposed is restricted to piecewise-constant intensities. The implementation of this work is available in the package . More recently, Bolano (2020) proposed an MTD-based approach to handle categorical covariates, that considers each covariate separately and combines the effects of the lags of the MTD and the covariates employing a mixture model. Specifically, the model is given by:

\[\begin{equation} P(X_t = k \mid X_{t-1} = i, C_1 = c_1, \dots, C_l = c_l) \approx \theta_0 a_{ik} + \sum_{h=1}^l \theta_h d_{c_{h}k} (\#eq:eq7) \end{equation}\]

where $a_{ik}$ is the transition probability from state $i$ to state $k$, as in a conventional Markov chains and $d_{c_{h}k}$ is the probability of observing the states $k$ given the modality $c_h$ of the covariate $h$. Lastly, $\theta_0, \dots, \theta_l$ are the weights of the explanatory elements of the model.

According to the literature presented, several researchers have proposed methodologies or generalizations to include covariates in Markov chain models. Primarily for social sciences and health applications, where the transition probabilities were generally modeled through logistic regression. However, there has been an increased focus on categorical covariates, opposing continuous covariates and a lack of approaches to multivariate Markov chain models. Thus, with this work, we aim to tackle this research gap.

Multivariate Markov chains with covariates

Theoretical model

In this work, a new generalization of Ching, Fung, and Ng (2002) MMC model is presented: the GMMC model, that is, we will consider exogeneous or pre-determined covariates in the $\sigma$ - algebra generated by the available information until $t-1$ ($\mathcal{F}_{t-1}$). These variables can be deterministic or stochastic and do not necessarily need to be reported at time $t$. Broadly, the model is given by:

\[\begin{equation} P(S_{jt} = k | \mathcal{ F}_{t-1} ) = P(S_{jt} = k | S_{1t-1} = i_1, S_{2t-1} = i_2, \dots, S_{st-1} = i_s, \boldsymbol{x}_t) (\#eq:eq8) \end{equation}\] We can specify this model as proposed by Ching, Fung, and Ng (2002) with Raftery’s notation:

\[\begin{multline} P(S_{jt} = i_0 | S_{1t-1} = i_1,\dots, S_{st-1} = i_s, \boldsymbol{x}_t) \equiv \\ \lambda_{j1}P(S_{jt} = i_0 | S_{1t-1} = i_1,\boldsymbol{x}_t) + \dots + \lambda_{js}P(S_{jt} = i_0 | S_{st-1} = i_s, \boldsymbol{x}_t) (\#eq:eq9) \end{multline}\] subject to the usual constraints.

Estimation and inference

This proposed model is estimated through MLE, similar to the standard MTD model. The log-likelihood is given by:

\[\begin{equation} LL = \sum_{t = 1}^n log P(S_{jt} = i_0 | S_{1t-1} = i_1, \dots, S_{st-1} = i_s, \boldsymbol{x}_t) (\#eq:eq10) \end{equation}\]

Additionally, the probabilities can be estimated through an multinomial logit model. The proof for consistency and asymptotic distribution is available in the Supplementary Material section.

Monte Carlo simulation study

A Monte Carlo simulation study was designed to evaluate the dimension and power of the test parameters of the proposed model. The R statistical environment was used for all computations. This simulation study was comprised of two parts.

Part I: Detect a non-homogeneous Markov chain

First, we considered two sequences with two and three states. The main goal was to assess if the model detected the presence of a non-homogeneous Markov chain correctly and if the estimate of the parameter would correspond to the expected. So, given two sequences, one generated through a non-homogeneous Markov chain and the other generated through a homogeneous Markov chain, it would be expected that the parameter associated with the transition probabilities of the first sequence would be one and the parameter associated with the transition probabilities of the second sequence would be zero. With this in mind, the transitions probabilities of the first sequence were estimated through a logistic regression, where parameters of this regression were randomly generated in R, and the second sequence was generated through a first-order Markov chain. Hence, for both states cases considered, it was expected that the estimated regression would be:

\[\begin{multline} P(S_{1t} = i_0 | S_{1t-1} = i_1, S_{2t-1} = i_2, \boldsymbol{x}_{t-1}) = \\ 1 \times P(S_{1t} = i_0 | S_{1t-1} = i_1,\boldsymbol{x}_{t-1}) + 0 \times P(S_{1t} = i_0 | S_{2t-1} = i_2, \boldsymbol{x}_{t-1}) (\#eq:eq11) \end{multline}\]

To assess the test power and dimension, we used the Wald test with the following hypothesis:

Power and dimension of test assessment
	Hypothesis	Test
Power	$H_0: \lambda_{11} = 0$	$\frac{\widehat{\lambda}_{11}^2}{se(\widehat{\lambda}_{11})^2} \sim \chi^2_{(1)}$
	$H_0: \lambda_{12} = 1$	$\frac{(\widehat{\lambda}_{12}-1)^2}{se(\widehat{\lambda}_{12})^2} \sim \chi^2_{(1)}$
Dimension	$H_0: \lambda_{11} = 1$	$\frac{(\widehat{\lambda}_{11}-1)^2}{se(\widehat{\lambda}_{11})^2} \sim \chi^2_{(1)}$
	$H_0: \lambda_{12} = 0$	$\frac{\widehat{\lambda}_{12}^2}{se(\widehat{\lambda}_{12})^2} \sim \chi^2_{(1)}$

The simulation procedure was performed as follows:

Generate the values of the coefficients for the probability transition matrix of series $S_{1t}$ randomly;
Generate the probability transition matrix of series $S_{2t}$ randomly;
Set the initial value of $S_{2t}$ to 1 and simulate the following from the defined probability transition matrix;
In each iteration (of 1000 repetitions),
- Generate $X_t \sim N(2,25)$;
- Generate the time-varying probabilities of series $S_{1t}$ through the values of the fixed coefficients and the lagged variable $x_t$;
- Set the initial values of the series $S_{1t}$ as 1;
- For each period $t$, simulate the next state of $S_{1t}$ from the probabilities simulated for that moment;
- Estimate the model through the function mmcx;
- Calculate the Wald test and add to the counter if it is rejected.

Simulation study results for two-states, displaying the proportion of rejections of the null hypothesis for two parameter values. Dimension of test remains stable regardless sample size. Power of test increases with sample size. The proposed model detects the presence of non-homogenenous Markov Chain.

Considering two states, the test dimension was at 5.7% with a sample size of 100 observations, sightly increased with 500 observations, and returned to the expected values in 1000 and 5000 observations. For a sample size of 100, 500, and 1000 observations, we have low test power. So, when considering two states, the sample must have at least 5000 observations, or, if that is not possible, consider a higher significance level when testing for individual significance.

Simulation study results for three-states, displaying the proportion of rejections of the null hypothesis for two parameter values. Dimension of test decreases as sample size increases. Power of test is stable regardless of sample size. The proposed model detects the presence of non-homogenenous Markov Chain.

Considering three states, the test dimension was 9.7% for a sample size of 100 observations, 0.2% for a sample size of 500 observations, and 0.3% for a sample size of 1000. Regarding the test power, we see similar behavior, for a sample of 100 observations, the test power was 90.5%, and from a sample of 500 observations, we reach a test power of 100%. Thus, when considering three states, one may consider a sample of 500 observations without compromising the test power and dimension.

Part II: Detecting Parameters Assigned Values

Secondly, we performed a simulation study where we considered two non-homogeneous Markov chain with two states. Here, the main goal was to assess if the model correctly detected the parameters assigned. So, in this case, we started by generating the terms of the model proposed. These terms were estimated through logistic regression, and the parameters of this regression were randomly generated in R. Similarly to Part I, we considered a Wald test to assess the power and dimension of the test. The simulation procedure was performed as follows:

Generate the values of the coefficients to calculate the probability transition matrices randomly;
In each iteration (of 1000 repetitions),
- Generate $\{x_t\} \sim N(2,25)$;
- Generate the probabilities $P \left(S_{jt}|S_{st-1}, x_{t-1} \right)$, with $j=1,2$ and $s=1,2$.
- Set the initial values of the series $S_{1t}$ and $S_{2t}$ as 1;
- For each period $t$, calculate the probabilities $P \left(S_{1t}|S_{1t-1}, S_{2t-1}, x_{t-1} \right)$ and $P \left( S_{2t}|S_{1t-1}, S_{2t-1}, x_{t-1} \right)$ through the assigned values of the $\lambda$’s. Considering the calculated probabilities, simulate the next state for each series, $S_{1t}$ and $S_{2t}$.
- Estimate the model through the function mmcx;
- Calculate the Wald test and add to the counter if it is rejected.

The probabilities $P\left(S_{1t}|S_{1t-1}, x_{t-1} \right)$ and $P\left(S_{1t}|S_{2t-1}, x_{t-1}\right)$ presented some differences regarding its values’ distributions. Specifically, $P\left(S_{1t}|S_{1t-1}, x_{t-1} \right)$ had more extreme probabilities values, with the minimum value being close to 0 and the maximum value being close to 1. And, the probabilities $P\left(S_{1t}|S_{2t-1}, x_{t-1} \right)$ had more moderate values, with the minimum value being, on average, 0.3 and the maximum value, 0.7. When the probabilities have values close to 1, one says that the states/regimes are persistent. We calculated the power and dimension of test for each value of $\lambda$ when the estimated probabilities are moderate and when they are extreme. Hence, considering equation 1:

\[\begin{multline} P\left(S_{1t} = i_0 | S_{1t-1} = i_1,\dots, S_{2t-1} = i_2, \boldsymbol{x}_{t-1} \right) = \\ \lambda_{11}P\left(S_{1t} = i_0 | S_{1t-1} = i_1,\boldsymbol{x}_{t-1}\right) + \lambda_{12}P\left(S_{1t} = i_0 | S_{2t-1} = i_s, \boldsymbol{x}_{t-1} \right) (\#eq:eq12) \end{multline}\]

The parameter $\lambda_{11}$ will be associated with more extreme probabilities and $\lambda_{12}$ will be associated with more moderate probabilities.

Simulation study results for persistent states on low values of the parameters (case 1), displaying the proportion of rejections of the null hypothesis for four parameter values. Dimension decreases as sample size increases. Power of test increases with sample size. The proposed model has low power of test when low parameter values are associated with persistent states.

Simulation study results for persistent states on high values of the parameters (case 2), displaying the proportion of rejections of the null hypothesis for four parameter values. Dimension and power of test increase as sample size increases. The results point towards a low test power in this setting.

When the states are persistent and the parameter’s value is low (i.e., 0.2 and 0.4), we have low test power. By increasing this value, the power of test increases as well. When the states are not persistent, we do not have a clear pattern regarding the power of test, for a value of the parameter of 0.2, the power of test is still low (although not as low as the first scenario), increases when we have a value of 0.4, decreases when the value is 0.6 and increases again when the value is 0.8. Overall, the estimated standard errors seem high, leading to low test power. Regarding the test dimension, when we have a higher weight associated with the non-persistent states, the test dimension converges to 0. However, when this weight is associated with the persistent states, the test dimension increases with the sample size, reaching a value of 10% in some cases. Hence, one must use a 10% significance level to perform statistical inference on the parameters in this situation.

Software implementation

Regarding the software implementation for each function, for the multimtd function the estimation method was presented in Berchtold (2001) applied to the multivariate case. For multimtd_probit, a package for numerical maximization of the log-likelihood, (Henningsen and Toomet 2011), was used. This package performs Maximum Likelihood estimation through different optimization methods that the user can choose. The optimization methods available are Newton-Raphson, Broyden - Fletcher - Goldfarb - Shanno, BFGS al- algorithm, Berndt - Hall - Hall - Hausman, Simulated ANNealing, Conjugate Gradients, and Nelder-Mead. Finally, for the mmcx function, a different approach was used. Unlike the MTD- Probit, the model proposed has equality and inequality restrictions in the parameters. The (Henningsen and Toomet 2011) package only allows one type of restriction for each Maximum Likelihood estimation, so it was not possible to use this package to estimate the proposed model with exogenous variables. Hence, the algorithm used was the Augmented Lagrangian method, available in the (Varadhan 2015) package through the function auglag. This estimation method for the proposed model is not very common, however, it has been applied to Markov chain models (Rajarshi 2013). The GMMC model’s probabilities were estimated through a Multinomial Logit using rmultinom of the package (Venables and Ripley 2002).

Additionally, the hessian matrices were also computed, which allowed performing statistical inference. The maxLik and auglag compute the Hessian matrices with the estimates. For the function multimtd, since the optimization procedure of Berchtold (2001) was used, the hessian was computed through the second partial derivatives. The function multi.mtd requires the following elements:

y, a matrix of the categorical data sequences.
deltaStop, the delta below which the optimization phases of the parameters stop.
is_constrained, flag indicating whether the function will consider the usual set of constraints (usual set: , new set of constraints: ).
delta, the amount of change to increase/decrease in the parameters for each iteration of the optimization algorithm.

The last three arguments concern the optimization procedure. For more details see Berchtold (2001). Considering two vectors of two categorical data sequences, s1 and s2, to estimate the model and obtain the results:

multi.mtd(y=cbind(s1,s2), deltaStop=0.0001, is_constrained=TRUE, delta=0.1)

The function multi.mtd_probit requires the following arguments:

y, a matrix of the categorical data sequences.
initial, a vector of the initial values of the parameters.
nummethod, the numerical maximization method, currently either “NR” (for Newton-Raphson), “BFGS” (for Broyden-Fletcher-Goldfarb-Shanno), “BFGSR” (for the BFGS algorithm implemented in R), “BHHH” (for Berndt-Hall-Hall-Hausman), “SANN” (for Simulated ANNealing), “CG” (for Conjugate Gradients), or “NM” (for Nelder-Mead). Lower-case letters (such as “nr” for Newton-Raphson) are allowed. The default method is “BFGS”. For more details see (Henningsen and Toomet 2011) package.

Considering two vectors of two categorical data sequences, s1 and s2 again, to estimate the model an obtain the results with BFGS maximization method:

multi.mtd_probit(y = cbind(s1,s2), initial=c(1,1,1), nummethod='bfgs')

Finally, the function mmcx requires the following elements:

y, a matrix of categorical data sequences.
x, a matrix of covariates (exogeneous variables).
initial, a vector of the initial values of the parameters.

Considering two vectors of two categorical data sequences, s1 and s2, and a vector of an exogeneous variables, x, to estimate the model and obtain the results:

mmcx(y = cbind(s1,s2), x = cbind(x), initial=c(1,1))

These functions return a list with the parameter estimates, standard errors, z-statistics, p- values, and the log-likelihood function value for each equation.

The package offers an additional function that allows to obtain the transition probability matrices of mmcx considering a specific value of x defined by the user. The function is MMC_tpm and requires the following elements:

s, a matrix of categorical data sequences.
x, a matrix of covariates (exogeneous variables).
value, a single value of x, to condition the probability transition matrices.
result, a list returned by the function mmcx containing the model’s estimates.

Considering two vectors of two categorical data sequences, s1 and s2, a vector of an exogeneous variables, x and res the list returned by the function mmcx, to obtain the transition probability matrices:

MMC_tpm(s = cbind(s1,s2), x = cbind(x), value = max(x), result = res)

The function returns an array containing the probability transition matrices, conditioned on a specific value of x, for each equation.

Illustration

Markov chain models are used in interdisciplinary areas, such as economics, business, biology, and engineering, with applications to predict long-term behavior from traffic flow to stock market movements, among others. Modeling and predicting stock markets returns is particularly relevant for investors and policy makers. Since the stock market is a volatile environment, and the returns are difficult to predict, estimating the set of probabilities that describe these movements, might provide relevant input. Additionally, incorporating the effect of key macroeconomic variables could provide a more accurate picture of this specific environment.

The following empirical illustration aims to model stock returns of two indexes as a function of the interest rate spread, specifically the 10-Year Treasury Constant Maturity Minus 3-Month Treasury Constant Maturity.

The interest rate spread is a key macroeconomic variable and provides valuable information regarding the economy state. Specifically, it has been used to forecast recessions as in Estrella and Mishkin (1996), Dombrosky and Haubrich (1996), Chauvet and Senyuz (2016), Tian and Shen (2019) and McMillan (2021). Generically, short-term yields are lower than long-term yields when the economy is in expansion. On the other hand, short-term yields are higher than long-term yields when the economy is in recession. The difference between these yields (or, more specifically, the yield curve’s slope) can be used to forecast the state of the economy. Hence, this indicator might provide relevant input for investors.

We considered the 5-week-day daily stock returns ($r_t=100 \times \log(P_t/P_{t-1})$, where $P_t$ is the adjusted close price) of two indexes, S&P500 and DJIA, from November $11^{th}$ 2011 to September $1^{st}$ 2021 (2581 observations). Additionally, we considered the interest rate spread ($spread_{t}$), the 10-Year Treasury Constant Maturity Minus 3-Month Treasury Constant Maturity. The data was retrieved from FRED. Below, we have the descriptive statistics of these variables.

Summary statistics of $stockreturns$ dataset
Variable	Minimum	1$^{st}$ Quantile	Median	Mean	3$^{rd}$ Quantile	Maximum
$spread_{t}$	-0.52	0.92	1.54	1.454	2.03	2.97
$r_{t;SP500}$	-12.765	-0.32	0.07	0.054	0.518	8.968
$r_{t;DJIA}$	-13.842	-0.327	0.071	0.046	0.508	10.764

Moreover, to apply the model proposed, it is necessary to have a categorical time series, thus we applied the following procedure:

\[ S_{st}= \begin{cases} 1, r_t \leq \widehat{q}_{s;0.25}\\ 2, \widehat{q}_{s;0.25} < r_t < \widehat{q}_{s;0.75} \\ 3, r_t \geq \widehat{q}_{s;0.75}\\ \end{cases} \]

where $\widehat{q}_{s;\alpha}$ is the estimated quantile of order $\alpha$ of the marginal distribution of $r_t$. Considering this illustration and the model proposed, we will have two equations:

\[\begin{multline} P(S_{sp500,t} | S_{sp500, t-1}, S_{djia, t-1}, spread_{t-1}) = \\ \lambda_{11} P(S_{sp500,t} | S_{sp500, t-1}, spread_{t-1}) + \lambda_{12} P(S_{sp500,t} | S_{djia, t-1}, spread_{t-1}) (\#eq:eq13) \end{multline}\]

\[\begin{multline} P(S_{djia,t} | S_{sp500, t-1}, S_{djia, t-1}, spread_{t-1}) = \\ \lambda_{21} P(S_{djia,t} | S_{sp500, t-1}, spread_{t-1}) + \lambda_{22} P(S_{djia,t} | S_{djia, t-1}, spread_{t-1}) (\#eq:eq14) \end{multline}\]

In Figures @ref(fig:fig11) to @ref(fig:fig22) generate through (Wickham 2016) and (Auguie 2017), we have the smoothed conditional probabilities of both series, depending on $spread_{t-1}$. The number of observations is high, and the probabilities varied abruptly in a small time frame, making the plots hard to read. To simplify, a moving average model (from (Borchers 2022)) of order 5, due to the frequency of the data, was adjusted to these probabilities to illustrate how they evolve throughout time. These plots represent the probabilities associated with the parameters of the general model proposed, showcasing how these vary throughout time and the main of advantage of this generalization. Instead of having fixed matrices of transition probabilities, we allow for these to vary throughout time, depending on the values of $spread_{t-1}$. Specifically, Figures @ref(fig:fig11) and @ref(fig:fig12) correspond to the non-homogeneous Markov chain to build the SP&500’s equation and Figures @ref(fig:fig21) and Figures @ref(fig:fig22) correspond to the non-homogeneous Markov chain to build DJIA’s equation. We see a similar behavior within each series regardless of whether it depends on the previous states of $S_{1t}$ or $S_{2t}$. Additionally, the scales of the graphs are small, indicating that these probabilities vary around the same set of values.

$Estimated conditional probabilities of series 1 (SP500) depending on $spread_{t-1}$ and on series 1 (SP500) previous state: $P(S_{sp500,t} | S_{sp500, t-1}, spread_{t-1})$. This figure shows the estimated non-homogeneous Markov chain from which the realized probabilites will be extracted to maximize the log-likelihood function.$

Estimated conditional probabilities of series 1 (SP500) depending on $spread_{t-1}$ and on series 1 (SP500) previous state: $P(S_{sp500,t} | S_{sp500, t-1}, spread_{t-1})$. This figure shows the estimated non-homogeneous Markov chain from which the realized probabilites will be extracted to maximize the log-likelihood function.

$Estimated conditional probabilities of series 1 (SP500) depending on $spread_{t-1}$ and on series 2 (DJIA) previous state: $P(S_{sp500,t} | S_{djia, t-1}, spread_{t-1})$. This figure shows the estimated non-homogeneous Markov chain from which the realized probabilites will be extracted to maximize the log-likelihood function.$

Estimated conditional probabilities of series 1 (SP500) depending on $spread_{t-1}$ and on series 2 (DJIA) previous state: $P(S_{sp500,t} | S_{djia, t-1}, spread_{t-1})$. This figure shows the estimated non-homogeneous Markov chain from which the realized probabilites will be extracted to maximize the log-likelihood function.

$Estimated conditional probabilities of series 2 (DJIA) depending on $spread_{t-1}$ and on series 1 (SP500) previous state: $P(S_{djia,t} | S_{sp500, t-1}, spread_{t-1})$. This figure shows the estimated non-homogeneous Markov chain from which the realized probabilites will be extracted to maximize the log-likelihood function.$

Estimated conditional probabilities of series 2 (DJIA) depending on $spread_{t-1}$ and on series 1 (SP500) previous state: $P(S_{djia,t} | S_{sp500, t-1}, spread_{t-1})$. This figure shows the estimated non-homogeneous Markov chain from which the realized probabilites will be extracted to maximize the log-likelihood function.

$Estimated conditional probabilities of series 2 (DJIA) depending on $spread_{t-1}$ and on series 2 (DJIA) previous state: $P(S_{djia,t} | S_{djia, t-1}, spread_{t-1})$. This figure shows the estimated non-homogeneous Markov chain from which the realized probabilites will be extracted to maximize the log-likelihood function.$

Estimated conditional probabilities of series 2 (DJIA) depending on $spread_{t-1}$ and on series 2 (DJIA) previous state: $P(S_{djia,t} | S_{djia, t-1}, spread_{t-1})$. This figure shows the estimated non-homogeneous Markov chain from which the realized probabilites will be extracted to maximize the log-likelihood function.

The model can be estimated through the mmcx function:

attach(stockreturns)
res <- mmcx(cbind(sp500, djia), spread_1, initial=c(1,1))

## --------------------------------------------
## Equation 1 
##   Estimate Std. Error t value Pr(>|t|)    
## 1 0.685660   0.171350   4.002    0.000 ***
## 2 0.314340   0.171350   1.834    0.067 *  
## 
## Log-Likelihood: -2636.355 
## --------------------------------------------
## --------------------------------------------
## Equation 2 
##   Estimate Std. Error t value Pr(>|t|)    
## 1 0.629992   0.176438   3.571    0.000 ***
## 2 0.370008   0.176438   2.097    0.036 ** 
## 
## Log-Likelihood: -2636.622 
## --------------------------------------------

Considering the first equation, the effect of the probabilities depending on S&P500’s previous state and the interest rate spread has a higher weight on the overall probability. Also, this estimate is highly significant, presenting a $p$-value close to zero. The effect of DJIA’s previous state in S&P500 is lower but it is also significant for a 10% significance level. In the second equation, the effect of S&P500’s previous state is higher than DJIA’s and both estimates are highly significant.

One of the advantages of this approach is the possibility to assess the transition probabilities for specific values of $x_t$, in this case, the interest rate spread. For both series, we calculated the transition probabilities for this variable’s minimum and maximum value in the sample, which are -0.52 and 2.97, respectively. To obtain the probability transition matrices for these two cases, the code is the following:

tpm_max <- MMC_tpm(cbind(sp500, djia), spread_1, 
                   value = max(spread_1), result = res)

tpm_min <- MMC_tpm(cbind(sp500, djia), spread_1, 
                   value = min(spread_1), result = res)

library(markovchain)
plot(new('markovchain', transitionMatrix = tpm_max[,,1])) # Generate figure 9
plot(new('markovchain', transitionMatrix = tpm_min[,,1])) # Generate figure 10
plot(new('markovchain', transitionMatrix = tpm_max[,,2])) # Generate figure 11
plot(new('markovchain', transitionMatrix = tpm_min[,,2])) # Generate figure 12

In Figures @ref(fig:fig-sp500-min) and @ref(fig:fig-sp500-max), we have the transition probabilities network for S&P500, corresponding to the minimum and maximum value of the spread. The most noticeable difference between these two networks is regarding the transition probability from the second state to the third state. For the maximum value of $spread_{t-1}$, the transition probability from the second state to the third state is 0.6. So, when the economy is strong, one might expect to have higher returns, when $t-1$ was in the second state. However, this scenario shifts when considering the minimum value of $spread_{t-1}$. The probability of obtaining higher returns, that is, being in state three, becomes almost evenly distributed, regardless of the state in $t-1$. This indicates the instability of the stock market, when the economy is weaker. Another difference in these networks, is regarding the transition probability from the third state to the first state. For the maximum value of $spread_{t-1}$, this probability is 0.27 and for the minimum value increases to 0.44. This is also expected, since when the economy is weaker, the probability of having lower returns is greater.

$Graphical representation of the transition probability matrix of Series 1: SP500 for the maximum value of spread$_{t-1}$. The highest probability of 0.6 refers to the transition from state 2 to state 3.$

Graphical representation of the transition probability matrix of Series 1: SP500 for the maximum value of spread$_{t-1}$. The highest probability of 0.6 refers to the transition from state 2 to state 3.

$Graphical representation of the transition probability matrix of Series 1: SP500 for the minimum value of spread$_{t-1}$. The highest probability of 0.56 refers to the transition from state 2 to state 2.$

Graphical representation of the transition probability matrix of Series 1: SP500 for the minimum value of spread$_{t-1}$. The highest probability of 0.56 refers to the transition from state 2 to state 2.

Considering the second equation (Figures $\ref{fig:fig-djia-max}$ and $\ref{fig:fig-djia-min}$), corresponding to the DJIA’s returns, we see a similar behaviour as in S&P500’s networks. The transition probability from the second state to the third state is higher for the maximum value of $spread_{t-1}$ and the transition probability from the third state to the first state is higher when we consider the minimum value of $spread_{t-1}$. Although, the difference of this last probability between the minimum and maximum value of $spread_{t-1}$ is not as big as in S&P500. Overall, the rest of the probabilities structure, remains the same.

$Graphical representation of the transition probability matrix of Series 2: DJIA for the maximum value of spread$_{t-1}$. The probability of 0.58 refers to the transition from state 2 to state 3.$

Graphical representation of the transition probability matrix of Series 2: DJIA for the maximum value of spread$_{t-1}$. The probability of 0.58 refers to the transition from state 2 to state 3.

$Graphical representation of the transition probability matrix of Series 2: DJIA for the minimum value of spread$_{t-1}$. The highest probability of 0.51 refers to the transition from state 2 to state 2.$

Graphical representation of the transition probability matrix of Series 2: DJIA for the minimum value of spread$_{t-1}$. The highest probability of 0.51 refers to the transition from state 2 to state 2.

Conclusions, limitations and further research

Several proposals for including of exogenous variables in MMC models have been presented. The main limitations were associated with the high complexity of the models to be developed and estimated. Additionally, most models considered only categorical exogenous variables, existing a lack of focus on continuous exogenous variables. This work proposes a new approach to include continuous exogenous variables in Ching, Fung, and Ng (2002) model for multivariate Markov chain. This is relevant because it allows studying the effect of previous series and exogenous variables on the transition probabilities. The model is based on Ching, Fung, and Ng (2002) MMC model but considers non-homogeneous Markov chains. Thus, the probabilities that compose the model are dependent on exogenous variables. These probabilities are estimated as a usual non-homogeneous Markov chain through a multinomial logit model. The model parameters are then estimated through MLE, as well as the standard errors. We developed a package with the estimation function of the model proposed. In this, we considered the Augmented Lagrangian optimization method for estimating the parameters through MLE. Additionally, we designed a Monte Carlo simulation study to assess this model’s test power and dimension. The results showed that the model detected a non-homogeneous Markov chain. Moreover, an empirical illustration demonstrated the relevance of this new model by estimating the probability transition matrix for different exogenous variable values. Ignoring the effect of exogenous variables in MMC means that we would not detect the probabilities’ changes according to the covariates’ values. In this setting, one would have a limited view of the studied process. Hence, this approach allows us to understand how a specific variable influences a specific process. The main contributions of this work are the development of a package with functions for multivariate Markov chains, addressing the statistical inference in these models and the inclusion of covariates. The limitations are related to the implementation in R, specifically the optimization algorithm applied is not common for MMC models, in that sense, it would be beneficial to study new approaches to optimizing the maximum likelihood function as further research. Additionally, extending this generalization to the MTD-probit model proposed by Nicolau (2014) would also be relevant, which removes the constraints of the model’s parameters and allows the model to detect negative effects.

Adke, S. R., and S. R. Deshmukh. 1988. “Limit Distribution of a High Order Markov Chain.” Journal of the Royal Statistical Society. Series B (Methodological) 50 (1): 105–8. https://www.jstor.org/stable/2345812.

Auguie, Baptiste. 2017. gridExtra: Miscellaneous Functions for "Grid" Graphics. https://CRAN.R-project.org/package=gridExtra.

Azzalini, A. 1994. “Logistic regression for autocorrelated data with application to repeated measures.” Biometrika 81 (4): 767–75. https://doi.org/10.1093/biomet/81.4.767.

Bartholomew, J. 1968. “Stochastic Models for Social Processes.” The Australian and New Zealand Journal of Sociology 4 (2): 171–72. https://doi.org/https://doi.org/10.1177/144078336800400215.

Berchtold, A. 1995. “Autoregressive Modelling of Markov Chains.” Proc. 10th International Workshop on Statistical Modelling 104: 19–26. https://doi.org/10.1007/978-1-4612-0789-4_3.

———. 1996. “Modélisation autorégressive des chaines de Markov : utilisation d’une matrice différente pour chaque retard.” Revue de Statistique Appliquée 44 (3): 5–25. http://www.numdam.org/item/RSA_1996__44_3_5_0/.

———. 2001. “Estimation in the Mixture Transition Distribution Model.” Journal of Time Series Analysis 22 (4): 379–97. https://doi.org/https://doi.org/10.1111/1467-9892.00231.

———. 2003. “Mixture transition distribution (MTD) modeling of heteroscedastic time series.” Computational Statistics and Data Analysis 41 (3-4): 399–411. https://doi.org/10.1016/S0167-9473(02)00191-3.

Berchtold, A., O. Maitre, and K. Emery. 2020. “Optimization of the mixture transition distribution model using the march package for R.” Symmetry 12 (12): 1–14. https://doi.org/10.3390/sym12122031.

Bolano, D. 2020. “Handling covariates in markovian models with a mixture transition distribution based approach.” Symmetry 12 (4). https://doi.org/10.3390/SYM12040558.

Borchers, Hans W. 2022. Pracma: Practical Numerical Math Functions. https://CRAN.R-project.org/package=pracma.

Chauvet, M., and Z. Senyuz. 2016. “A Dynamic Factor Model of the Yield Curve Components as a Predictor of the Economy.” International Journal of Forecasting 32 (2): 324–43. https://doi.org/https://doi.org/10.1016/j.ijforecast.2015.05.007.

Chen, D. G., and Y. L. Lio. 2009. “A Novel Estimation Approach for Mixture Transition Distribution Model in High-Order Markov Chains.” Communications in Statistics - Simulation and Computation 38 (5): 990–1003. https://doi.org/10.1080/03610910802715009.

Ching, W. K., E. S. Fung, and M. K. Ng. 2002. “A Multivariate Markov Chain Model for Categorical Data Sequences and Its Applications in Demand Predictions.” IMA Journal of Management Mathematics 13 (3): 187–99. https://doi.org/10.1093/imaman/13.3.187.

———. 2003. “A Higher-Order Markov Model for the Newsboy’s Problem.” The Journal of the Operational Research Society 54 (3): 291–98.

Ching, W. K., and M. K. Ng. 2006. Markov Chains: Models, Algorithms and Applications. Springer. https://doi.org/10.1007/0-387-29337-X.

Ching, W. K., M. K. Ng, and E. S. Fung. 2008. “Higher-order multivariate Markov chains and their applications.” Linear Algebra and Its Applications 428 (2-3): 492–507. https://doi.org/10.1016/j.laa.2007.05.021.

Damásio, B. 2013. “Multivariate Markov Chains - Estimation, Inference and Forecast. A New Approach: What If We Use Them As Stochastic Covariates?” Master dissertation, Universidade de Lisboa, Instituto Superior de Economia e Gestão. http://hdl.handle.net/10400.5/6397.

———. 2018. “Essays on Econometrics: Multivariate Markov Chains.” {PhD} dissertation, Universidade de Lisboa, Instituto Superior de Economia e Gestão. https://www.repository.utl.pt/bitstream/10400.5/18128/1/TD-BD-2019.pdf.

Damásio, B., and S. Mendonça. 2019. “Modelling insurgent-incumbent dynamics: Vector autoregressions, multivariate Markov chains, and the nature of technological competition.” Applied Economics Letters 26 (10): 843–49. https://doi.org/10.1080/13504851.2018.1502863.

———. 2020. “Leader-Follower Dynamics in Real Historical Time: A Markovian Test of Non-Linear Causality Between Sail and Steam (Co-)development, Mimeo.”

Damásio, B., and J. Nicolau. 2014. “Combining a regression model with a multivariate Markov chain in a forecasting problem.” Statistics & Probability Letters 90: 108–13. https://doi.org/https://doi.org/10.1016/j.spl.2014.03.026.

———. 2020. “Time inhomogeneous multivariate Markov chains : detecting and testing multiple structural breaks occurring at unknown dates.” REM Working Papers 0136–2020. REM Working Papers. Instituto Superior de Economia e Gestão. http://hdl.handle.net/10400.5/20164.

Dombrosky, A. M., and J. Haubrich. 1996. “Predicting Real Growth Using the Yield Curve.” Economic Review I (Q): 26–35. https://EconPapers.repec.org/RePEc:fip:fedcer:y:1996:i:qi:p:26-35.

Estrella, A., and F. S. Mishkin. 1996. “The yield curve as a predictor of U.S. recessions.” Current Issues in Economics and Finance 2 (Jun). https://www.newyorkfed.org/research/current_issues/ci2-7.html.

Henningsen, A., and O. Toomet. 2011. “maxLik: A Package for Maximum Likelihood Estimation in R.” Computational Statistics 26 (3): 443–58. https://doi.org/10.1007/s00180-010-0217-1.

Islam, M. A., S. Arabia, and R. I. Chowdhury. 2004. “A Three State Markov Model for Analyzing Covariate Dependence.” International Journal of Statistical Sciences 3 (i): 241–49. http://www.ru.ac.bd/stat/wp-content/uploads/sites/25/2019/01/P21.V3s.pdf.

Islam, M. A., and R. I. Chowdhury. 2006. “A higher order Markov model for analyzing covariate dependence.” Applied Mathematical Modelling 30 (6): 477–88. https://doi.org/10.1016/j.apm.2005.05.006.

Jackson, Christopher. 2011. “Multi-State Models for Panel Data: The Msm Package for r.” Journal of Statistical Software 38: 1–28. https://doi.org/10.18637/jss.v038.i0810.18637/jss.v038.i08.

Jacobs, P. A., and A. W. Lewis. 1978. “Discrete Time Series Generated by Mixtures II : Asymptotic Properties.” Journal of the Royal Statistical Society: Series B (Methodological) 40 (2): 222–28. https://www.jstor.org/stable/2984759.

Kalbfleisch, J. D., and J. F. Lawless. 1985. “The analysis of panel data under a Markov assumption.” Journal of the American Statistical Association 80 (392): 863–71. https://doi.org/10.1080/01621459.1985.10478195.

Kijima, M., K. Komoribayashi, and E. Suzuki. 2002. “A multivariate Markov model for simulating correlated defaults.” Journal of Risk 4 (July). https://doi.org/10.21314/JOR.2002.066.

Le, N. D., R. D. Martin, and A. Raftery. 1996. “Modeling Flat Stretches, Brusts, and Outliers in Time Series Using Mixture Transition Distribution Models.” Journal of the American Statistical Association 91 (436): 1504–15. https://doi.org/10.1111/j.2517-6161.1985.tb01383.x.

Lèbre, S., and P. Y. Bourguignon. 2008. “An EM algorithm for estimation in the mixture transition distribution model.” Journal of Statistical Computation and Simulation 78 (8): 713–29. https://doi.org/10.1080/00949650701266666.

Logan, J. 1981. “A structural model of the higher‐order Markov process incorporating reversion effects.” The Journal of Mathematical Sociology 8 (1): 75–89. https://doi.org/10.1080/0022250X.1981.9989916.

Maitre, O., and K. Emery. 2020. March: Markov Chains. https://CRAN.R-project.org/package=march.

Martin, R. D., and A. Raftery. 1987. “Non-Gaussian State-Space Modeling of Nonstationary Time Series: Comment: Robustness, Computation, and Non-Euclidean Models.” Journal of the American Statistical Association 82 (400): 1044–50. https://doi.org/10.2307/2289377.

McMillan, D. G. 2021. “Predicting GDP Growth with Stock and Bond Markets: Do They Contain Different Information?” International Journal of Finance & Economics 26 (3): 3651–75. https://doi.org/https://doi.org/10.1002/ijfe.1980.

Mehran, F. 1989. “Analysis of Discrete Longitudinal Data: Infinite-Lag Markov Models.” In Statistical Data Analysis and Inference, 533–41. Amsterdam: North-Holland. https://doi.org/https://doi.org/10.1016/B978-0-444-88029-1.50053-8.

Muenz, L. R., and L. V. Rubinstein. 1985. “Markov Models for Covariate Dependence of Binary Sequences .” Biometrics 41 (1): 91–101. http://www.jstor.org/stable/2530646.

Nicholson, W. 2013. DTMCPack: Suite of Functions Related to Discrete-Time Discrete-State Markov Chains. https://CRAN.R-project.org/package=DTMCPack.

Nicolau, J. 2014. “A new model for multivariate markov chains.” Scandinavian Journal of Statistics 41 (4): 1124–35. https://doi.org/10.1111/sjos.12087.

Nicolau, J., and F. I. Riedlinger. 2014. “Estimation and inference in multivariate Markov chains.” Statistical Papers 56 (4): 1163–73. https://doi.org/10.1007/s00362-014-0630-6.

Pegram, G. 1980. “An Autoregressive Model for Multilag Markov Chains.” Journal of Applied Probability 17 (2): 350–62. https://doi.org/10.2307/3213025.

Raftery, A. 1985. “A Model for High-Order Markov Chains.” Journal of the Royal Statistical Society: Series B (Methodological) 47 (3): 528–39. https://doi.org/10.1111/j.2517-6161.1985.tb01383.x.

Raftery, A., and S. Tavaré. 1994. “Estimation and Modelling Repeated Patterns in High Order Markov Chains with the Mixture Transition Distribution Model.” Applied Statistics 43 (1): 179–99. https://doi.org/10.2307/2986120.

Rajarshi, M. B. 2013. Statistical Inference for Discrete Time Stochastic Processes. SpringerBriefs in Statistics. http://www.springer.com/978-81-322-0762-7.

Regier, M. H. 1968. “A Two-State Markov Model for Behavioral Change.” Journal of the American Statistical Association 63 (323): 993–99. https://doi.org/10.1080/01621459.1968.11009325.

Siu, T. K., W. K. Ching, E. S. Fung, and M. K. Ng. 2005. “On a multivariate Markov chain model for credit risk measurement.” Quantitative Finance 5 (6): 543–56. https://doi.org/10.1080/14697680500383714.

Spedicato, G. A. 2017. “Discrete Time Markov Chains with r.” The R Journal, July. https://journal.r-project.org/archive/2017/RJ-2017-036/index.html.

Spilerman, S., and B. Singer. 1976. “The Representation of Social Processes by Markov Models.” American Journal of Sociology 82 (1): 1–54. https://www.jstor.org/stable/2777460.

Tian, R., and G. Shen. 2019. “Predictive Power of Markovian Models: Evidence from US Recession Forecasting.” Journal of Forecasting 38 (6): 525–51. https://doi.org/https://doi.org/10.1002/for.2579.

Varadhan, R. 2015. Alabama: Constrained Nonlinear Optimization. https://CRAN.R-project.org/package=alabama.

Venables, W. N., and B. D. Ripley. 2002. Modern Applied Statistics with s. Fourth. New York: Springer. https://www.stats.ox.ac.uk/pub/MASS4/.

Wang, C., T. Z. Huang, and W. K. Ching. 2014. “A new multivariate Markov chain model for adding a new categorical data sequence.” Mathematical Problems in Engineering 2014. https://doi.org/10.1155/2014/502808.

Wasserman, S. 1980. “Analyzing social networks as stochastic processes.” Journal of the American Statistical Association 75 (370): 280–94. https://doi.org/10.1080/01621459.1980.10477465.

Wickham, Hadley. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org.

Wong, C. S., and W. K. Li. 2001. “On a mixture autoregressive conditional heteroscedastic model.” Journal of the American Statistical Association 96 (455): 982–95. https://doi.org/10.1198/016214501753208645.

Zhang, X., M. L. King, and R. J. Hyndman. 2006. “A Bayesian approach to bandwidth selection for multivariate kernel density estimation.” Computational Statistics and Data Analysis 50 (11): 3009–31. https://doi.org/10.1016/j.csda.2005.06.019.

Zhu, D. M., and W. K. Ching. 2010. “A new estimation method for multivariate Markov chain model with application in demand predictions.” Proceedings - 3rd International Conference on Business Intelligence and Financial Engineering, BIFE 2010, 126–30. https://doi.org/10.1109/BIFE.2010.39.

	Hypothesis	Test
Power	\(H_0: \lambda_{11} = 0\)	\(\frac{\widehat{\lambda}_{11}^2}{se(\widehat{\lambda}_{11})^2} \sim \chi^2_{(1)}\)
	\(H_0: \lambda_{12} = 1\)	\(\frac{(\widehat{\lambda}_{12}-1)^2}{se(\widehat{\lambda}_{12})^2} \sim \chi^2_{(1)}\)
Dimension	\(H_0: \lambda_{11} = 1\)	\(\frac{(\widehat{\lambda}_{11}-1)^2}{se(\widehat{\lambda}_{11})^2} \sim \chi^2_{(1)}\)
	\(H_0: \lambda_{12} = 0\)	\(\frac{\widehat{\lambda}_{12}^2}{se(\widehat{\lambda}_{12})^2} \sim \chi^2_{(1)}\)

Variable	Minimum	1\(^{st}\) Quantile	Median	Mean	3\(^{rd}\) Quantile	Maximum
\(spread_{t}\)	-0.52	0.92	1.54	1.454	2.03	2.97
\(r_{t;SP500}\)	-12.765	-0.32	0.07	0.054	0.518	8.968
\(r_{t;DJIA}\)	-13.842	-0.327	0.071	0.046	0.508	10.764