Volatility Forecasting using Hybrid GARCH Neural Network Models: The Case of the Italian Stock Market

In several financial applications, it is extremely useful to predict volatility with the highest precision. Neural Networks alongside GARCH-type models have been extensively employed in the last decades for estimating volatility of financial indices. The motivation of this survey is to decide whether combining different types of models can improve the return volatility forecasts. Thus, two hybrid models are utilized and compared with an asymmetric GARCH model and a Neural Network in terms of their ability to predict the volatility of the FTSE MIB index. The conclusions reveal that the hybrid model, which is based on a Neural Network having as inputs the returns and its historical values as well as the estimates of conditional volatility obtained by an EGARCH model, provides the best predictive power. Moreover, the dominance of this hybrid model is such that it forecast encompasses the remaining models. Finally, it is demonstrated that there are significant leverage effects in the Italian stock market.


INTRODUCTION
Modelling and forecasting stock market volatility has caused great concern among researchers and financial market participants since it is one of the most essential inputs to several financial implications from asset and option pricing to risk measurement. Volatility, with respect to financial indices, can be considered as a measure of risk or it can be thought of as the degree by which a stock price fluctuates around its average value. Predicting returns and volatilities of the assets with greater accuracy is critical for investment decisions in financial markets. Market participants employ different approaches for forecasting volatility of financial variables. According to Nazarian et al. (2013), these approaches are divided into two categories, the classic one and the Neural Network. One of the classic methods is the time series modelling of financial data with time-varying variance which has received a lot of interest. Engle (1982) was the first one to introduce such types of models (ARCH) while Bollerslev (1986) generalized these models and provided the GARCH model. Since financial data are highly volatile, these models with heteroscedastic errors have been extensively used in finance (Ahmed and Suliman, 2011;Awartani and Corradi, 2005;Curto et al., 2009;Engle and Patton, 2007;Koopman et al., 2005;Liu and Hung, 2010;Perron, 2010 andMarcucci, 2005;Dritsaki, 2017;Kartsonakis-Mademlis and Dritsakis, 2018). However, despite their success in forecasting various variables, the results in terms of the predictive accuracy of financial data are far from satisfactory due to the fact that the classic methods (structural models) depend on information obtained from historical events. Given the non-linearities and the complex relationships in the stock markets, time-series approaches may not be able to capture these characteristics. For this reason, non-linear and more flexible models such as Neural Networks can produce better results in modeling and forecasting than linear models (Georgescu and Dinucă, 2011;Ghiassi et al., 2006;Güreşen et al., Nazarian et al. (2013) proposed that the combination of classic and Neural Networks (hybrid models) has been the reaction of the lack of consensus regarding whether the efficient market hypothesis holds true or not since this method is able to predict future trend of prices with higher accuracy compared to the classic methods (Abounoori et al., 2013;Bildirici and Ersin, 2009;Güreşen and Kayakutlu, 2008;Hajizadeh et al., 2012;Kailas, 2012;Khan and Gour, 2013;Merh et al., 2010;Pai and Lin, 2005;Sui et al., 2007;Wang et al., 2012 andWei et al., 2011).
One of the most crucial applications of GARCH models in finance is forecasting. Given the fact that the volatility is a key input to several financial decision-making models, the performance of a model in predicting volatility is of utmost importance (Kartsonakis-Mademlis and Dritsakis, 2020b). This is the rationale behind this research to produce more accurate volatility forecasts. To this end, we combine GARCH-type models with Artificial Neural Networks in two different ways to achieve higher predictive accuracy for the volatility of the Borsa Italiana Stock Exchange index, FTSE MIB. More specifically, we first model the return series of the stock index with an ARMA model using the Box and Jenkins (1976) methodology. In the second step, the fitness of two asymmetric GARCH models, namely, EGARCH and GJR-GARCH is evaluated and compared. In the third step, we estimate an ANN model. Then, we construct the two hybrid models, namely, the ANN-GARCH model and the GARCH-ANN model. The resulting forecasts from each of the four models (asymmetric GARCH model, ANN model and the two hybrid models) are compared with each other in terms of closeness to the realized volatility. Weekly time-series data from January 7, 1998 to December 6, 2017 was used from which 940 observations (approximately 90% of the total observations) were utilized for estimations (training set) and 100 observations (approximately 10% of the total observations) for out-of-sample forecasts (test set).
The remainder of this paper is organized as follows. Section 2 displays a brief review of the literature. Section 3 provides the methodology. Section 4 presents the data and some preliminary results. The empirical findings are analyzed in Section 5 and some concluding remarks are given in section 6.

LITERATURE REVIEW
Over the last years, the Artificial Neural Network models have proven their superiority over other models in time-series forecasts. One of the earliest studies is of Kryzanowski et al. (1993) who examined the performance of a Neural Network using financial data and seven macroeconomics variables to discriminate between stocks with positive returns and those with negative. Their findings propose that the Neural Network correctly classifies 72% of the positive/negative returns, which is considered high. Donaldson and Kamstra (1997) introduced a combination of a Neural Network with a semi-nonparametric non-linear GARCH model for capturing the impact of volatility on stock returns and evaluated its ability to forecast stock return volatility in Toronto, New York, Tokyo and London. Their out-of-sample forecast results show that their ANN model surpasses those from the GARCH, EGARCH and GJR-GARCH models. Qi and Maddala (1999) showed a predictive improvement of a linear regression model employing a Neural Network model on stock price forecast. Their results largely hold out-of-sample. Schittenkopf et al. (2000) constructed a semi-non-parametric model to estimate conditional density utilizing a Neural Network framework. Their recurrent mixture density network was based on the basic ideas of the GARCH-type models but it was also able for modelling any continuous conditional density allowing for higher-order time-depending moments. The authors used their model on the returns of the FTSE100 and their findings support that the out-of-sample forecast slightly outperforms those of GARCH models. Roh (2007) proposed hybrid models with time-series and Neural Network models for a stock price index volatility forecast in two ways: direction and deviation. He used the model in the stock market of South Korea and the results revealed the utility of the Neural Network forecasting combined with time-series models. Bildirici and Ersin (2009) utilized GARCH-type models and enhanced them with Neural Networks to examine the volatility of daily returns in the Istanbul stock market. Their hybrid models show improved forecasts. Hajizadeh et al. (2012) attempted to increase the ability of GARCHtype models in forecasting the return volatility. They proposed two hybrid models based on Neural Networks and EGARCH model. Their results demonstrate that the second hybrid model produces better volatility forecasts in terms of closeness to the realized volatility. Kristjanpoller et al. (2014) tested a hybrid Neural Networks-GARCH model to forecast the volatility of three Latin-America stock markets, namely, Brazil, Chile and Mexico. Their findings support that the ANN models are able to increase the forecasting performance of the GARCH-type models and that the results are robust for various ANN specifications and volatility measures. Lu et al. (2016) compared the volatility forecasting accuracy between two types of hybrid models combining ANN with asymmetric GARCH models, namely, EGARCH and GJR-GARCH. Their results propose that the EGARCH-ANN hybrid model outperformed the rest of the models in terms of forecasting the volatility of the Chinese energy market. Kristjanpoller and Minutolo (2016) used an ANN-GARCH model and incorporated financial variables, such as exchange rates and stock market indices, to achieve better forecasts of oil price volatility. Their findings show that the hybrid model improves the volatility forecasting accuracy by 30% as measured by HMSE. Lahmiri (2017) compared the predictive ability of two GARCH models (GARCH and EGARCH), two hybrid models (GARCH-ANN and EGARCH-ANN) and a Neural Network with inputs a set of technical indicators. The results showed that in terms of MAE, MSE and Theil's inequality coefficient, the simple approach of a Neural Network predicted the volatility of the two exchange rates (U.S./Canada and U.S./Euro) more accurately compared to the rest of the models. Bhattacharya and Ahmed (2018) analysed the volatility of crude oil in India and their results were in favor of hybrid models over GARCH models. More specifically, they attempted to forecast the oil's volatility by comparing various GARCH-type models with hybrid GARCH-ANN models. Based on the MSE index, they revealed that the EGARCH-ANN provided the best predictive ability among the models. However, the addition of an exchange rate (Indian Rupee/Saudi Arabia Riyal) as input in the hybrid model did not provide any further improvement. Ramos-Perez et al. (2019) utilized data from 2000 to 2017 for S&P500 and constructed a hybrid model to predict its volatility. Their hybrid model was based on machine learning techniques and they compared it with two GARCH-type-ANN models, Heston's (1993) model and a Neural Network. The findings, based on the RMSE index, supported the superiority of their model in all five sub-periods that the sample was divided.

ARMA/GARCH-type Models
Developing and establishing ARMA models as tools for forecasting financial variables are known as the Box and Jenkins (1976) methodology. This approach in time-series analysis is a method of finding an ARMA(p,q) model that adequately describes the stochastic process from which it came the sample. The ARMA model can be expressed as follows: where r t is the return of the stock market index, N represents the conditional normal density with zero mean and conditional variance σ t 2 . Moreover, : t 1 is the available information up to time t−1, B is the backshift operator on t and µ is the series mean. The polynomials Φ(B) and θ(B) are referring as the autoregressive (AR) and moving average (MA) terms, respectively and are assumed to have all inverted roots inside the unit circle.
In financial markets, volatility as a measure of risk has become a crucial component in various applications such as portfolio and risk management, derivative pricing, Black-Scholes model for option pricing, etc.
Several models based on the stochastic volatility process and time-series modelling have been developed as alternatives to the historical and implied volatility. Since the development of the ARCH model of Engle (1982) and the GARCH model of Bollerslev (1986), many extensions have been proposed. The most widely used models are the asymmetric EGARCH model of Nelson (1991) and the asymmetric GJR-GARCH model of Glosten et al. (1993).

EGARCH
The Exponential Generalized Autoregressive Conditional Heteroscedasticity (EGARCH) model was proposed by Nelson (1991) to take into account the leverage effects of price fluctuation on conditional variance. This means that a negative shock (bad news) can have a greater impact on volatility than a positive shock (good news) of the same magnitude. In the EGARCH(1,1) model the conditional variance is expressed in logarithmic form as follows: where ω, α 1 , β 1 and γ 1 are parameters for evaluation. When ε t−1 is positive or when there is good news, then the overall effect ε t−1 will be (α 1 + γ 1 ) ε t−1 . Conversely, if ε t−1 is negative or there is bad news, then the overall effect ε t−1 will be (α 1− γ 1 ) ε t−1 . In other words, we consider that the term ε t−1 is the one that takes into account the asymmetry in the EGARCH model when the parameter γ 1 ≠ 0. Therefore, the coefficient is the one that determines the leverage effects. In addition, when γ 1 < 0, then a positive shock causes less volatility compared to a negative shock of the same size (asymmetry). Finally, in the case of the EGARCH model, there are no restrictions on the parameters for evaluation, so as to avoid a negative conditional variance. In this model the conditional variance σ t 2 depends on both the size and the sign of ε t−1 .

GJR-GARCH
Another asymmetric model to address the asymmetry of the volatility of the time-series is the GJR-GARCH (1,1) model proposed by Glosten et al. (1993). Its form is given by the following equation: The model is well defined if the following conditions are met: where ω, α 1 , β 1 and γ 1 are parameters for evaluation. I t−1 is a dummy variable which takes the value 1 if ε t−1 < 0 and 0 otherwise. The above model Eq. 5 suggests that the bad news (ε t < 0) and the good news (ε t > 0) may have different impacts on the conditional variance. If the coefficient γ 1 is positive, then there is asymmetry and so leverage effects. The leverage effect is described by the sum (α 1 + γ 1 ) in negative shocks, which is greater than the push (α 1 ) in positive shocks.
The parameters of the GARCH models were estimated employing the maximum likelihood method. In the case of Student-t distribution for the independent and identically distributed random variable z t (z t =ε t ⁄σ t ) the following log likelihood function needs to be maximized: is the gamma function and v denotes the degrees of freedom. Considering the standard Generalized Error Distribution (GED), the log likelihood function takes the following form: Dritsaki, 2017).
In order to select between these two asymmetric GARCHtype models, the maximum log-likelihood value, the Akaike information criterion (AIC), the Schwarz information criterion (SIC) and the Hannan-Quinn information criterion were utilized.

Artificial Neural Network (ANN) Model
As we mentioned before, in stock markets there are non-linearities that cannot be captured by GARCH models which assume a linear correlation structure among the data (Fahimifard et al., 2009). Therefore, by employing linear models to deal with such complex problems may not produce adequate results. An ANN is a computational model that attempts to mimic the ability of human brain to process data and extract patterns (Luo and Shah, 2007). Based on the construction of human brain, a set of neurons is connected and organized in layers. These layers are consisting of input layers, hidden layers and output layers.
One of the greatest merits of such models is that, at least theoretically, they are able of approximating any continuous function, meaning that the researchers do not need to assume any hypotheses about the underlying model (Pakdaman et al., 2017).
Neural networks are separated into two categories, feed-forward and feedback networks. Both networks consist of neurons that are connected to each other, permitting a neuron to impact other neurons. Feed-forward networks allow signals to travel only from input to output while feedback networks allow a two-way communication by introducing loops in the network. Moreover, feed-forward Neural Networks with the back propagation algorithm permits the model to re-evaluate its parameters, through a stochastic gradient descent, so that they are in alignment with the loss function during the estimation process (Lu et al., 2016). Stochastic gradient descent is an optimization algorithm which minimizes the quadratic error. In other words, this survey utilizes the back propagation Neural Network (the so-called BPNN), which is the most widely used in financial applications (see among others, Hajizadeh et al., 2012;Ko, 2009;Lu et al., 2016;Tseng et al., 2008 andWang, 2009).
To deal with the possibility of overfitting the training set and failing to capture the true statistical process of the data, we set only one hidden layer. In general, by increasing the number of hidden layers, the danger of overfitting is also raising which results to poor out of sample forecasting performance.
We chose one input layer, one hidden layer and one output layer. All three layers can be represented as vectors. The input layer In order to activate the hidden unit j we transform the linear summation of Eq. 10 by employing a logistic activation function g (a): The neuron of the output layer is given by: where corresponds to the number of the inputs (i=1,2,…,d), j, corresponds to the number of hidden neurons which are three (j = 1,2,3). w ji are the weights from the input layer to the hidden layer and w 1j are the weights from the hidden layer towards the output layer.
Moreover, to extract valuable results, it is important to normalize the data prior to feeding them into the Neural Network. To this end, we apply the min-max feature scaling which brings all the values of the data into the range [0,1] and is given by the following formula: where x is the original data, x min and x max are the minimum and the maximum of the data, respectively and x ' is the normalized data. The output of the Neural Network is then de-normalized by using the Eq. 14 solved for x. Figure 1 displays the architecture of the three-layer back propagation Neural Network that will be used in this survey. Furthermore, the dataset is divided into two subsets in order to construct an adequate Neural Network with real financial data, the training set which consists of 90% of the total observations and the test set with the remaining 10% (Lewis, 2017. p. 53).

Hybrid Model
In this research, we implement two hybrid models following, in some way, the work of Lu et al. (2016) for forecasting volatility of the stock market index. Initially, an ARMA model is constructed by employing the Box-Jenkins methodology and the preferred GARCH-type model (either EGARCH or GJR-GARCH model) is identified based on some criteria (Maximum Likelihood, Akaike, Schwarz and Hannan-Quinn) upon which the hybrid models are built. Then, a Neural Network is estimated from selected explanatory input variables.

Type I Hybrid Model: ANN-ARMA-GARCH
Type I model is constructed by inputting the conditional volatility outcome of the ARMA-GARCH-type model into ANN. In other words, the output of the preferred ARMA-GARCH model, i.e. the estimated conditional variance, is considered as an input variable to the ANN model in order to enhance its forecasting performance regarding the volatility of the stock market index.

Type II Hybrid model: ARMA-GARCH-ANN
Type II model is constructed by incorporating the output layer of the ANN model, i.e. y 1 , as a variable to the variance equation of the ARMA-GARCH-type model.  Figure 2 gives a visual representation of the process of constructing the four competitive models.

Forecast Encompassing
In order to examine the relative properties of the forecast series, two forecast encompassing tests are conducted following the work of Cook (2012). The first one is the Fair and Shiller (1989) test. This test can be derived from the following regression: where RV t is the realized volatility, c is a constant, f 1 , t is the forecast of the volatility made from model 1 and f 2 , t is the forecast of the volatility made from model 2. When λ 1 = 0 and λ 2 ≠ 0, the model 2 forecast encompasses (outperforms) the model 1. Contrariwise, the model 1 forecast encompasses model 2 if λ 1 ≠ 0 and λ 2 = 0. In the case that both λ 1 ≠ 0 and λ 2 ≠ 0 then we fail to reject the null hypothesis and both forecasts contain independent information.
The second test is based on the ability of one forecast in explaining the error of another. Forecast error can be considered as the information that a forecast failed to capture. Denoting the forecast error as , , i t t i t e RV f = − , the regressions of the Chong and Hendry (1986) test are given as follows: If the forecast error of model 1(e 1,t ) is not related to the forecast of model 2 (f 2,t ), i.e. λ 2 = 0, then forecast 1 can be used individually.
On the contrary, if the forecast error is impacted by the other forecast, then a composite forecast should be formatted including both f 1,t and f 2,t .

DATA CHARACTERISTICS
The weekly prices from January 7, 1998 to January 6, 2016 are considered as the training samples (940 observations), while those from January 13, 2016 to December 6, 2017 are used as the testing samples (100 observations). The dataset is comprised of weekly prices of the Italian stock market index, namely, FTSE MIB 1 (hereafter, MIB). This study utilizes weekly data because it is less noisy. We use weekly data on the Wednesday closing prices due to the fact that in general there are fewer holidays on Wednesdays than Fridays. Any missing data on Wednesday  closes was replaced with the closing prices from the most recent successful trading session. Stock data was obtained from Yahoo Finance. In agreement with previous studies, the continuously compounded weekly returns r t were computed as the first logdifference, r t = 100×ln (P t /P t−1 ), where P t is the weekly closing price. Table 1 presents the preliminary statistical characteristics of the return series of the training set.
The series presents a negative mean and its standard deviation is larger than the mean value. The series displays negative skewness and a large amount of kurtosis, a fairly common occurrence in high-frequency financial data which implies that GARCH-type models are adequate. In addition, the null hypothesis of normality is rejected by the Jarque and Bera (1980) test statistic at 1% level of significance. The (squared) Q-statistic of Ljung and Box (1978) which is used for detection of (heteroscedasticity) autocorrelation is significant, implying that the past behavior of the market may be more relevant. Finally, the Augmented Dickey and Fuller (1979;1981) unit root test indicates that the return series is stationary at the 1% level of significance.    (Dickey and Fuller, 1979). The lag lengths for ADF equations were selected using the Schwarz Information Criterion (SIC). MacKinnon (1996) critical values for rejection of the hypothesis of unit root applied. Q(i) and Q 2 (i) are the Ljung-Box statistics for serial correlation of the series and squared series at i th lags (Ljung and Box, 1978) To evaluate forecast accuracy, this survey is in agreement with the work of Pagan andSchwert (1990) (Day andLewis, 1992;Franses and Van Dijk, 1996;and Wei, 2002) and compares the volatility forecasts of the four models with realized volatility. The realized volatility 2 (RV) on day t is computed as follows: 2 As a robustness check, we found that our overall conclusions in this paper hold even if we use different RV proxies, i.e. the squared returns Brooks, 1998 andPatton, 2011).
where r is the average logarithmic return. Moreover, three metrics are utilized to evaluate the performance of the models in forecasting volatility, namely, mean absolute error (MAE), symmetric mean absolute percentage error (SMAPE) and root square error (RMSE). The metrics are defined as:

EMPIRICAL RESULTS
This study focuses only on the orders p = 1 and q = 1 of the GARCH-type models, namely, EGARCH(1,1) and GJR-GARCH(1,1). As Brooks (2008) stated, GARCH(1,1) will be sufficient to capture the volatility clustering in financial data and rarely any higher-order model is estimated. Since the kurtosis of returns exceeds the value 3, which indicates the necessity of fattailed distribution to describe the return series, student t and GED distributed errors were considered.
According to the Box-Jenkins methodology, ARMA(2,2) model is the preferred one to model the mean features of the return series. Table 2 presents the coefficients of the ARMA model as well as its residual diagnostics. The findings propose that the returns of the MIB index are negatively influenced by its lagged returns at time t−1 (−1.200) and t−2 (−0.979). Moreover, Q statistics indicate that this specification is adequate enough to capture the serial correlation, however, considering the squared Q statistics and the ARCH tests, the null hypothesis of the absence of conditional heteroscedasticity is highly rejected. Therefore, the need of GARCH models to fit the conditional heteroscedasticity of the underlying residuals is confirmed. Table 3 shows the values of the information criteria for each of the two competitive asymmetric GARCH models, namely EGARCH(1,1) and GJR-GARCH(1,1), as well as for each distribution under consideration. It is apparent that the EGARCH(1,1) model with GED distributed errors is the bestfitted model since Akaike, Schwarz and Hannan-Quinn criteria take their lowest values while Log-Likelihood takes its maximum value. Thus, it is the selected model for the construction of the hybrid models.
The estimated results from the ARMA(1,1)-EGARCH(1,1) with GED errors are presented in Table 4. Since the parameter is γ statistically significant and negative there are leverage effects in the Italian stock market. This indicates that the fluctuation of the stock price in the Italian market has an asymmetric influence on the volatility of the MIB index. This finding proposes that the investors in the Italian stock market are likely to make irrational decisions. Furthermore, based on the residual diagnostics, the model performs well, having captured the remaining ARCH effects.
Before moving to the construction of the hybrid models, we performed the forecast encompassing test of Chong and Hendry (1986) utilizing the forecasts obtained from the ARIMA(1,1)-EGARCH(1,1) model and the ANN and the results are shown in Table 5. From the first regression, we can see that the residuals obtained from the EGARCH model (e EGARCH = RV−f EGARCH ) are affected by the forecasts of the ANN. The same holds true from the opposite direction as the second regression suggests. This implies that both forecasts series should be included in the formation of a composite forecast series. For this reason, we use a simple mean forecast (from now on, SM) based on the forecasts of EGARCH model and ANN. In other words, we implement a weighted mean with equal weights, i.e. for both EGARCH forecasts and ANN forecasts the weights equal to 0.5. The SM forecasts will also be compared with the rest of the out-of-sample forecasts.
For establishing the Neural Network and the two hybrid models we utilized the following specifications. We used three neurons in the hidden layer and one neuron in the output layer. The target variable is the realized volatility. Moreover, the back propagation algorithm was employed with learning rate and threshold set to 0.01. We also utilized the sum of squared errors as the loss function, the min-max feature scaling formula for the normalization of the data and for all layers we used the same activation function, namely the logistic function. Regarding the selection of the inputs, this paper is in line with the work of Maciel and Ballini (2010) that used lagged values of their variables as inputs to the Neural Network by applying autocorrelation analysis. In the same way, we decide to feed the Neural Network with the returns of the MIB index and its lagged values at time t−1 and t−2 following the results of the Box-Jenkins methodology (see also, the mean equation of the ARMA model in Table 2 consists of two autoregressive terms, AR(1) and AR(2). Furthermore, for the construction of the Hybrid model Type I (ANN-ARMA-GARCH), apart from the returns and its lagged values, the estimated conditional volatility ( ) 2 t σ from the EGARCH model was also considered as input.
In order to extract more reliable results, we compare the predictive ability of the four competitive models in both in-sample and outof-sample forecasts while testing four different time horizons, namely, 4-, 25-, 50-and 100-weeks ahead. The two following Tables 6 and 7 show the values of the metrics for each model based on the de-normalized forecasts. Figures 5 and 6 present the plots   (24) 222.68*** ***Indicates statistical significance at 1%. Q(i) and Q 2 (i) are the Ljung-Box statistics for serial correlation of the series and squared series at i th lags (Ljung and Box, 1978). ARCH(i) represents the F-statistic of ARCH test of Engle (1982) at i th lags. 27.084 Q 2 (24) 9.858 *** and *Indicate statistical significance at 1% and 10%, respectively. Q(i) and Q 2 (i) are the Ljung-Box statistics for serial correlation of the series and squared series at i th lags (Ljung and Box, 1978). ARCH(i) represents the F-statistic of ARCH test of Engle (1982) at i th lags.     However, this is not the case considering the SMAPE metric. The SM forecasts perform slightly worse than the ANN forecasts, yet better than those from the EGARCH model. However, this composite forecast is not able to reach the predictive power of the Hybrid I model. As we expected, the values of the metrics in the out-of-sample forecasts are greater than the corresponding ones in the in-sample forecasts highlighting that fitted data is generally closer to the real data than the forecasted ones.
Next, to further test the results obtained from the forecasting performance based on the three metrics, i.e. the superiority of the Hybrid I model, we conduct two forecasting encompassing tests. The first one is the Fair and Shiller (1989) test and the results are shown in Table 8. All regressions have as dependent variable the realized volatility (RV t ). From the first regression, we ascertain that both forecasts' (f Hybrid I ) and (f EGARCH ) coefficients are significantly different from zero meaning that neither of these models encompasses the other. In other words, both models contain independent information for 100-weeks ahead forecasting of RV t . However, from the remaining regressions, it is clear that only the individual coefficients of the f Hybrid I are statistically significant indicating that the Hybrid's I forecasts encompass those of ANN, Hybrid II and SM. Table 9 presents the results of the second forecast encompassing test used in this research, i.e. the Chong and Hendry (1986) test.
Regarding the first regression of this Table and as indicated by the insignificant coefficient, the f EGARCH fails to capture information that the Hybrid I model has missed. The same applies to the f ANN , f Hybrid II and f SM meaning that none of these forecasts are able to provide more information compared to the forecasts obtained from the Hybrid I model. In contrary, the superiority of the Hybrid I model derived by the forecast evaluation metrics is further extended to show that the dominance of the f Hybrid I is such that it forecast encompasses all the remaining forecasts. The latter result means that the f Hybrid I is providing all that is offered by the rest of the forecasts plus something more.
The results of the in-sample volatility forecasts are reported in Table 6. It is clear that the Hybrid Type I model (ANN-ARMA(1,1)-GARCH(1,1)-GED) outperforms the remaining models with respect to their predictive ability in all time horizons and for all de-normalized metrics. The second-best model for predicting the volatility of the MIB index is the simple Neural Network while the volatility forecasting precision of the two GARCH-based models is far from satisfactory, with the EGARCH model being the worst. It is also interesting the fact that the metrics are getting better (smaller in values) as the time horizon of prediction is increasing, with the most accurate forecasts obtained in the 50-weeks horizon.
Turning our interest to the out-of-sample volatility forecasts, Table 7 uncovers the same pattern, which is the superiority of the Hybrid Type I model against the rest of the models in terms of closeness to the realized volatility for all cases. The ANN model again comes second while in this case, the Hybrid II model predicts the volatility of the MIB index in the poorest way. Moreover, the longer the time horizon is, the better the models perform, as indicated by the lower values of the MAE and the RMSE metrics.