16.1 Important notes about the algorithms
Specification of a properly constrained model with a unique solution is the responsibility of the user because MARSS()
has no way to tell if you have specified an insufficiently constrained model—with correspondingly an infinite number of solutions.
How do you know if the model is properly constrained? If you are using a MARSS model form that is widely used, then you can probably assume that it is properly constrained. If you go to papers where someone developed the model or method, the issue of constraints necessary to ensure “identifiability” will likely be addressed if it is an issue. Are you fitting novel MARSS models? Then you will need to do some study on identifiability in this class of models using textbooks. Often textbooks do not address identifiability explicitly. Rather it is addressed implicitly by only showing a model constructed in such a way that it is identifiable. In our work, if we suspect identification problems, we will often first do a Bayesian analysis with flat priors and look for oddities in the posteriors, such as ridges, plateaus or bimodality.
All the EM code in the MARSS package is currently in native R. Thus the model fitting is slow. The classic Kalman filter/smoother algorithm, as shown in page 331-335 in Shumway and Stoffer (2006), is based on the original smoother presented in Rauch (1963). This Kalman filter is provided in function MARSSkfss()
, but the default Kalman filter and smoother used in the MARSS package is based on the algorithm in Kohn and Ansley (1989) and papers by Koopman et al. This Kalman filter and smoother is provided in the KFAS package (Helske 2021). Table 2 in Koopman (1993) indicates that the classic algorithm is 40-100 times slower than the algorithm given in Kohn and Ansley (1989), Koopman (1993), and Koopman, Shephard, and Doornik (1999). The MARSS package function MARSSkfas()
provides a translator between the model objects in MARSS and those in KFAS so that the KFAS functions can be used. MARSSkfas()
also includes a lag-one covariance smoother algorithm as this is not output by the KFAS functions, and it provides proper formulation of the priors so that one can use the KFAS functions when the prior on the states is set at \(t=0\) instead of \(t=1\). Simply off-setting your data to start at t=2 and sending that value to \(t_{init}=1\) in the KFAS Kalman filter would not be mathematically correct!
EM algorithms will quickly get in the vicinity of the maximum likelihood, but the final approach to the maximum is generally slow relative to quasi-Newton methods. On the flip side, EM algorithms are quite robust to initial conditions choices and can be extremely fast at getting close to the MLE values for high-dimensional models. The MARSS package also allows one to use the BFGS method to fit MARSS models, thus one can use an EM algorithm to get close and then the BFGS algorithm to polish off the estimate. Restricted maximum-likelihood algorithms are also available for AR(1) state-space models, both univariate (Staples, Taper, and Dennis 2004) and multivariate (Hinrichsen and Holmes 2009). REML can give parameter estimates with lower variance than plain maximum-likelihood algorithms. Another maximum-likelihood method is data-cloning which adapts MCMC algorithms used in Bayesian analysis for maximum-likelihood estimation (Lele, Dennis, and Lutscher 2007).
Missing values are seamlessly accommodated with the MARSS package. Simply specify missing data with NAs. The likelihood computations are exact and will deal appropriately with missing values. However, no innovations, referring to the non-parametric bootstrap developed by Stoffer and Wall (1991), bootstrapping can be done if there are missing values. Instead parametric bootstrapping must be used.
You should be aware that maximum-likelihood estimates of variance in MARSS models are fundamentally biased, regardless of the algorithm used. This bias is more severe when one or the other of \(\mathbf{R}\) or \(\mathbf{Q}\) is very small, and the bias does not go to zero as sample size goes to infinity. The bias arises because variance is constrained to be positive. Thus if \(\mathbf{R}\) or \(\mathbf{Q}\) is essentially zero, the mean estimate will not be zero and thus the estimate will be biased high while the corresponding bias of the other variance will be biased low. You can generate unbiased variance estimates using a bootstrap estimate of the bias. The function MARSSparamCIs()
will do this. However be aware that adding an estimated bias to a parameter estimate will lead to an increase in the variance of your parameter estimate. The amount of variance added will depend on sample size.
You should also be aware that mis-specification of the prior on the initial states (\(\boldsymbol{\pi}\) and \(\boldsymbol{\Lambda}\)) can have catastrophic effects on your parameter estimates if your prior conflicts with the distribution of the initial states implied by the MARSS model. These effects can be very difficult to detect because the model will appear to be well-fitted. Unless you have a good idea of what the parameters should be, you might not realize that your prior conflicts.
The default behavior for MARSS()
is to set \(\boldsymbol{\Lambda}\) to zero and estimate \(\boldsymbol{\pi}\). This does not put any constraints on \(\boldsymbol{\Lambda}\) (there is no \(\boldsymbol{\Lambda}\) to put constraints on) and circumvents this problem. However if you plan to put contraints on \(\boldsymbol{\pi}\) or \(\boldsymbol{\Lambda}\), you should verse yourself in the most common problems. The common problems we have found with priors on \(\mathbf{x}_0\) are the following. Problem 1) The correlation structure in \(\boldsymbol{\Lambda}\) (whether the prior is diffuse or not) does not match the correlation structure in \(\mathbf{x}_0\) implied by your model. For example, you specify a diagonal \(\boldsymbol{\Lambda}\) (independent states), but the implied distribution has correlations. Problem 2) The correlation structure in \(\boldsymbol{\Lambda}\) does not match the structure in \(\mathbf{x}_0\) implied by constraints you placed on \(\boldsymbol{\pi}\). For example, you specify that all values in \(\boldsymbol{\pi}\) are shared, yet you specify that \(\boldsymbol{\Lambda}\) is diagonal (independent).
Unfortunately, using a diffuse prior does not help with these two problems because the diffuse prior still has a correlation structure and can still conflict with the implied correlation in \(\mathbf{x}_0\). One way to get around these problems is to set \(\boldsymbol{\Lambda}=0\) (a \(m \times m\) matrix of zeros) and estimate \(\boldsymbol{\pi} \equiv \mathbf{x}_0\) only. Now \(\boldsymbol{\pi}\) is a fixed but unknown (estimated) parameter, not the mean of a distribution. In this case, \(\boldsymbol{\Lambda}\) does not exist in your model and there is no conflict with the model. This is the default behavior of MARSS()
. Unfortunately estimating \(\boldsymbol{\pi}\) as a parameter is not always robust. If you specify that \(\boldsymbol{\Lambda}\)=0 and specify that \(\boldsymbol{\pi}\) corresponds to \(\mathbf{x}_0\), but your model explodes when run backwards in time, you cannot estimate \(\boldsymbol{\pi}\) because you cannot get a good estimate of \(\mathbf{x}_0\). Sometimes this can be avoided by specifying that \(\boldsymbol{\pi}\) corresponds to \(\mathbf{x}_1\) so that it can be constrained by the data \(\mathbf{y}_1\).
In summary, if the implied correlation structure of your initial states is independent (diagonal variance-covariance matrix), you should generally be ok with a diagonal and high variance prior or with treating the initial states as parameters (with \(\boldsymbol{\Lambda}=0\)). But if your initial states have an implied correlation structure that is not independent, then proceed with caution. With caution means that you should assume you have problems and test how your model fits with simulated data.