Inverted gamma vs. log-normal innovations in MSFSBEKK models in the forecasting of selected Polish exchange rates

The aim of the paper is to compare the forecasting potentials of two classes of Multiplicative Stochastic Factor – scalar BEKK (MSF-SBEKK) models which differ in the type of latent process. In the first class, the innovations of a first order autoregressive structure for the natural logarithm of latent variables are log-normal, whereas in the second class the innovations are inverted gamma distributed. The comparison of the models’ forecasting abilities by means of the predictive Bayes factor as well as the log predictive score and energy score were performed based on the Polish exchange rates. The author considered oneto ten-step-ahead predictions during the period beginning from 3 September 2019 and ending on 2 September 2020, which covers the time of the crisis caused by COVID-19. The author concluded that for most of the forecast horizons, the considered log-normal innovations outperformed the inverted gamma ones.


Introduction
In volatility modelling of financial time series, two major classes of models are used: the autoregressive conditionally heteroscedastic (ARCH) and stochastic volatility (SV) models. The conceptual difference between the two classes lies in modelling conditional variance. In the ARCH-type models, volatility is described by a deterministic function of the past of the process, whereas in the SV models the conditional variance is subject to unpredictable shocks. In joint modelling of multiple time series the following multivariate counterparts of these models are used: Multivariate GARCH (MGARCH) and Multivariate Stochastic Volatility (MSV) classes (see : Bauwens, Laurent, and Rombouts, 2006); Tsay, 2005). The multivariate approach to modelling financial time series is much more difficult than the univariate one, as it explicitly takes into account the full conditional covariance structure of asset prices, i.e. individual volatilities and correlations. Only a few of them could serve as practical tools for large portfolios. A solution to the problem of multivariate volatility modelling is using the hybrid models proposed by (Osiewalski, 2009) and (Osiewalski andPajor, 2009, 2018), based on scalar BEKK (SBEKK; Baba, Engle, Kraft, and Kroner, 1989) correlation structure and on one latent process (Multiplicative Stochastic Factor, MSF). The hybrid models exploit the advantages of both model classes: the flexibility of the SV class, where volatility is modeled by latent stochastic processes, and the relative simplicity of the MGARCH class.
The paper focuses on two types of the MSF-SBEKK specification: the LN-MSF-SBEKK and IG-MSF-SBEKK. The LN-MSF-SBEKK structure is obtained by multiplying the SBEKK conditional covariance matrix at time , Σ , by a scalar random variable such that {ln } is a Gaussian AR(1) latent process with auto-regression parameter . The LN-MSF-SBEKK process can be treated as a direct extension of the SBEKK process with unknown conditional distribution. When = 0, the LN-MSF-SBEKK process reduces itself to the SBEKK process with the conditional distribution being a continuous mixture of multivariate normal distributions with covariance matrices Σ , where ′ are independent and log-normally distributed. On the other hand, the multivariate Student t distribution can be expressed as a scale mixture of normal distributions with the inverted gamma (IG) as a mixing distribution. This fact was examined in Osiewalski andPajor (2018, 2019), where the IG-MSF-SBEKK specification was proposed as a natural hybrid extension of the SBEKK process with the Student t conditional distribution (t-SBEKK). In the IG-MSF-SBEKK specification the latent process {ln } remains an autoregressive process of order one, but with log inverted gamma innovations. For = 0 the latent variables (where ∈ ) are independent and have inverted gamma distribution. Thus = 0 leads to the t-SBEKK specification, in which the conditional distribution is represented as a continuous mixture of multivariate normal distributions with covariance matrices Σ and an inverted gamma distribution of . For ≠ 0 the unconditional distribution of the latent variables is
The prior distributions reflect little prior knowledge about the model parameters. As regards initial conditions for Σ , we take Σ 0 = 0 2 and treat 0 2 > 0 as an additional parameter, exponentially distributed a priori with mean 1, whereas the initial value of , 0 , is assumed to be nonrandom and equal to 1. When it comes to 0 , the first two vectors of observations are used as initial conditions for .

Forecast evaluation
The standard approach to the Bayesian forecast evaluation is based on the predictive likelihoodthe predictive data density value at the observed future data. Let 0 = ( 0 , 1 , … , ) be the vector of observations up to time , be the vector of unknown parameters and 1 +ℎ = ( 1 , … , +ℎ ) the latent variable vector. The predictive density function for h-step-ahead forecast is as follows: where ( +ℎ | 0 , 1 +ℎ , 0 , ) is the conditional density of the future observation vector given the vector of parameters, , and latent variables,  denote observed values of 0 and +ℎ , respectively. The predictive likelihood conditional on 0 , is the real number ( +ℎ | 0 , ). To compute this predictive likelihood we draw 1 +ℎ,( ) , ( ) for = 1, … , from the posterior distribution, next, if ℎ > 1, simulate the vector +1 +ℎ−1,( ) from conditional sampling distribution of observations given 0 , , 1 +ℎ,( ) and ( ) , and then calculate the average: or for ℎ = 1. (8) Let us assume that there are two competing models: and . Then the main Bayesian criterion of model comparison from the predictive point of view is the predictive Bayes factor, which is the ratio of the predictive likelihoods. In fact the posterior odds ratio can be expressed as the product of the predictive Bayes factor and the posterior odds ratio given the observed data 0 , : Thus, the predictive Bayes factor: updates the ratio of posterior probabilities based on the first observations after observing predicted data +ℎ . It is well-known that the negative log predictive likelihood, − log ( +ℎ | 0 ), is the logarithmic score (log score) for the predictive distribution at observation +ℎ (see: Bernardo and Smith, 1994;Dawid and Musio, 2014). Thus the logarithm of the Bayes factor is the difference of the log scores for the two models: The logarithm of the Bayes factor measures by how much the log score for model is better (smaller) than that for model (see: Dawid and Musio, 2015). The value of log ( +ℎ | 0 , ) < 0 indicates the poor forecasting performance of model at time for the h-step-ahead forecast in comparison to model . The log ( +ℎ | 0 , , ) returns a high value if +ℎ is in the high density region of ( +ℎ | 0 , , ), and a low value otherwise. The drawback of the log score is that it does not depend directly on the shape of the entire predictive density, but only on the value of the predictive density at the realized value of +ℎ . To compare the forecasting ability of the models under consideration for the forecast horizon ℎ, in the period from + 1 to + ℎ (where ℎ represents the number of h-step-ahead forecasts, ℎ ≤ ), we aggregate log scores and rank models by the average of the logarithms of the predictive likelihoods: The (ℎ, + 1, + ℎ , ) is a positively oriented score, which means that its larger values indicate more accurate density forecasts (in other words, larger values of this score are better). Unfortunately, the logarithmic scoring rule is sensitive to outliers.
Note that for ℎ = 1 (1, + 1, + 1 , ) = 1 1 log ( +1 , … , + 1 | 0 , , ) thus the average of the log predictive density scores times the length of forecasting period amounts to the predictive likelihood of the observed data from + 1 to + 1 . This is so because the log predictive likelihood at +1 , … , + 1 , can be rewritten as a sum of one-step-ahead log predictive likelihoods: . (13) Consequently, for ℎ = 1 there exists a Bayesian substantiation of the use of the sum of logarithmic scores. Equality (12) breaks down for ℎ > 1. The predictive density at +ℎ , … , + ℎ cannot be decomposed in terms of the h-step-ahead predictive likelihoods. Moreover, some models can perform well for certain forecast horizons while other models can be better for other horizons.
In the paper the author also used the energy score (ES), recommended for assessing the multidimensional predictive density (see e.g. Gneiting and Raftery, 2007;Székely and Rizzo, 2013). Let us assume that , +ℎ denotes the predictive distribution for +ℎ given the data up to time . The energy score is defined as: where ∈ (0, 2), X and Y are independent random vectors with distribution , +ℎ , and ‖•‖ denotes the Euclidean norm. To estimate the energy score, we use the following formula: where ( ) and ( ) are drawn from predictive distribution of +ℎ , using the MCMC methods presented in (Osiewalski and Pajor, 2009;Pajor, 2020). In this paper ES is applied with = 1, and it is aggregated over , …, + ℎ − 1: The average energy score is used to rank (of course informally from the Bayesian point of view) competing models.

Empirical results
In this part of the paper the author analysed financial data of daily quotations on three major Polish exchange rates: the EURO to the Polish zloty (EUR/PLN), the US dollar to Polish zloty (USD/PLN), and the British pound to the Polish zloty (GBP/PLN, all data were downloaded from http://stooq.com). The daily currency return data used in this study cover the period from 28 December 2017 to 2 September 2020 (see Figures  1 and 2). The dataset of the growth rates of the exchange rates consists of 693 observations (for each series). The first two observations were treated as an initial condition, while the last 237 observations were used for the forecast evaluation, thus = 456. In other words, to compare the predictive capacity of the models under consideration, the author used these first 456 observations as a training sample. The predictive capacity of the models was analysed in the most recent 237 trading days ( = 237). The author took into consideration one-to ten-step-ahead predictions during the period beginning on 3 October 2019 and ending on 2 September 2020, thus obtaining 228 predictive distributions for the one-to ten-dayahead forecast horizons. The predictive distributions were calculated based on the whole dataset available at time + for each = 0, 1, … , − 10. This resulted in 2280 estimated predictive distributions. The sequence of the one-step-ahead predictive distributions covers the period from 3 October 2019 to 20 August 2020, while the sequence of the ten-stepahead predictive distributions covers the period 16 October 2019-2 September 2020.
Moreover, the forecasting period was split using 12 March 2020 as the dividing point. The first subperiod (3 October 2019-11 March 2020) is characterized by the relatively (as compared to the second subperiod) low volatility of the forecasted exchange rates, whereas the second subperiod (12 March 2020-2 September 2020) contains the time of the crisis caused by the COVID-19 pandemic (the first incidence of COVID-19 was reported by the Polish authorities on 4 March 2020, and the lockdown-type control measures started on 12 March). Due to the coronavirus (COVID-19) and lockdown, significant turbulences and high volatility were observed, especially at the beginning of the pandemic lockdown period (see Figures 1 and 2).  As one can see from Figures 1 and 2, the EURO growth rates series seems to have the lowest volatility. The exchange rate series of USD/PLN appears to be the most volatile. The USD/PLN exchange rate strengthened to 4.29327 on 23 March 2020 from 3.87587 on 2 March 2020, and next fell below 3.6900 on 28 August 2020. A similar (but not the same) pattern during the forecasting period is observed in the plot of the GBP/PLN exchange rate. During the second predictive subperiod, relatively large fluctuations in all the exchange rates considered were observed. Thus it was possible to check how the outliers affect the predictive ability of the (MSF-)SBEKK models.
The VAR(1)-MSF-SBEKK models were re-estimated at a daily frequency. The computations are based on the 30000 Markov chain Monte Carlo posterior samples after having burnt 50000 cycles in each model.

A comparison of models with the predictive Bayes factor
As mentioned above, the author considered two basic specifications of the MSF-SBEKK models, LN-MSF-SBEKK and IG-MSF-SBEKK. Additionally, there are two natural reductions of the two hybrid models to SBEKK specifications, LN-SBEKK and IG-SBEKK (t-SBEKK), which result from imposing a zero restriction on . Moreover, there are two reductions to pure MSF specifications, LN-MSF and IG-MSF, which result from imposing zero restrictions on 1 and 2 . For the sake of comparison, the study also considered the VAR model with constant conditional covariance matrix, despite the fact that this model seems to be inadequate for the type of the data considered. The model assumptions are presented in Table 1.    Table 2 presents the decimal logarithms of the predictive Bayes factors for the whole forecasting period ( + 1, … , + 228) and for two subperiods ( + 1, … , + 113 and + 114, … , + 228) in favour of the LN-MSF-SBEKK model ( 1 ) versus the other models under consideration. The predictive Bayes factors (the preferred method of the Bayesian forecast comparison) presented in Table 2 show how the additional data (the whole path of observed future data in each period considered) influenced the posterior probability of the LN-MSF-SBEKK model relative to all the specifications under consideration. As one can see from Table 2 ( 2 | 0 , )) increased by about 15 orders of magnitude after observing predicted data +1 +228, . The increase of the posterior odds ratio was higher for the second subperiod (the lockdown period) by about 10 orders of magnitude. While comparing two pure SV specifications (the LN-MSF and the IG-MSF), the author concluded that log-normal innovations in the latent process were preferred by each additional set of data: +1 +228, , +1 +113, and +114 +228, . The opposite conclusion can be reached if one compares the two pure SBEKK specifications. The SBEKK model with conditional Student t distribution is better from the predictive point of view than the LN-SBEKK one. It seems that the SBEKK structure is less flexible in dealing with outliers, and therefore it requires a conditional distribution with thicker tails. Obviously, the last position is occupied by the VAR model with constant conditional covariances.

Forecast evaluation with the log-predictive score and energy score
This section presents the results of comparing the predictive ability of the considered Bayesian models with the use of non-Bayesian tools. Tables 3 to 5 present the average log-predictive scores along with their respective ranks. The first conclusion from Tables 3 to 5 is that the LN-MSF-SBEKK model produced the best forecasts for all the considered forecast horizons and for all the forecasting periods considered, except for ℎ = 1 in the lockdown period (11 March 2020-2 September 2020), for which the LN-MSF model turned out to be the best. Moreover, for the whole period + ℎ, … , + 228 + ℎ (3 September 2019-2 September 2020) and for the second subperiod (11 March 2020-2 September 2020) the first four positions in the ranks are occupied by the LN-MSF, IG-MSF, LN-MSF-SBEKK, and IG-MSF-SBEKK models. Thus, the second conclusion was that the presence of a latent process with dependent latent variables is more important than allowing for the pure SBEKK structure. Both the LN-SBEKK and IG-SBEKK models, which do not use any latent autoregressive processes and therefore are less flexible in dealing with outliers, appeared as the worst among conditional heteroscedastic models in terms of the average log-predictive score. Surprisingly, the IG-MSF-SBEKK model fits the predicted data worse (in terms of the average log predictive score) than the IG-MSF and LN-MSF ones. Both of the latter specifications used one latent process common to all the conditional variances and covariances. This assumption of common dynamics led to constant conditional correlations. While comparing the average log predictive score for the LN-MSF and IG-MSF models, the author found that the inverted gamma innovations in the autoregressive specification for the latent process were outperformed by log-normal ones. However, the pure SBEKK-type models in most cases ranked in the reverse order: the IG-SBEKK (t-SBEKK) model is better than the LN-SBEKK one. Turning attention to the energy score, one can see that the previous conclusion (based on the average log-predictive score) that the LN-MSF--SBEKK model outperformed the other models holds for the whole forecasting period considered (3 September 2019-2 September 2020 as well as for the subperiod 12 March 2020-2 September 2020. In the case of these two forecasting periods, the rank correlation coefficients were large and positive, see Table 8. Table 6   to a substantial deterioration in forecast performance. In fact, the pure SBEKK models took the last two positions among the models with conditional heteroscedasticity. A different result was obtained for the period 3 September 2019-11 March 2020 (characterized by relatively low volatility), in which the IG-MSF-SBEKK model was clearly superior to the other specifications at all the forecast horizons. For ℎ = 2, … ,10 the second position was occupied by the IG-SBEKK model, and the fourth position by the LN-MSF-SBEKK one. As one can see from Table 8, for the subperiod 3 September 2019-11 March 2020 the rank correlation coefficients were negative. However, the differences between the average energy scores seem to be within the limits of the numerical error.
Anna Pajor    As expected, the dispersion of the predictive distributions (measured by e.g. the interquartile range) increased with the forecast horizon. In fact, for the USD/PLN and EUR/PLN exchange rates, the average interquartile range for the ten-day forecast horizon was about three times larger than for the one-day horizon. In turn, for the GBP/PLN the average interquartile range for the ten-day forecast horizon was about eight times larger than for the one-day horizon. To illustrate how the predictive distribution changed along with the increasing forecast horizon, the author presents (in Figures  4 and 5) quantiles of the predictive distributions for ℎ = 1 and ℎ = 10, obtained in the models considered. The predictive distributions for ℎ = 10 were much more spread than those for ℎ = 1. Moreover, the accuracy of forecasts with ℎ = 10 was not quite satisfactorya few realized data were outside of the 90% confidence intervals.
As seen from Figures 4 and 5, and while analysing the predictive quantiles, the predictive distributions obtained in the LN-MSF-SBEKK model turned out to be the most dispersedthis dispersion was measured by interquartile ranges and the difference between the quantile of order 0.05 and of order 0.95. The LN-SBEKK and IG-BEKK models are characterized by the least spread of predictive distributions. The differences in the predictive distributions among the models influenced the values of the predictive Bayes factors, log-predictive scores and energy scores.

Conclusion
The paper compared the predictive ability of the LN-MSF-SBEKK and IG-MSF-SBEKK models (and their simplifications) in the context of modelling the main exchange rates for PLN. According to the results obtained on the basis of the average log-predictive score and energy score, the author concluded that for most of the forecast horizons considered, the LN-MSF-SBEKK specification outperformed the IG-MSF-SBEKK one. As measured by the average log-predictive score, the predictive performance of the stochastic volatility models with log-normal innovations dominated that of the SV ones with inverted gamma innovations.
The results can be sensitive to the prior assumptions regarding the model parameters, especially the parameters related to the latent process: , 2 and . In this paper the author did not carry out the study of the sensitivity of these results to a priori assumptions. It is worth mentioning that under the author's prior assumptions about the parameters of latent processes, the marginal variances of the two latent processes (based on lognormal and inverted gamma innovations) had similar prior distributions. In this sense, the prior distributions of the basic parameters in the LN-MSF-