Comparative Evaluation of Forecast Accuracies for ARIMA, Exponential Smoothing and VAR

While various linear and nonlinear forecasting models exist, multivariate methods like VAR, Exponential smoothing, and Box-Jenkins’ ARIMA methodology constitute the widely used methods in time series. This paper employs series of Turkish private consumption, exports and GDP data ranging between 1998: Q1 and 2017: Q4 to analyze the forecast performance of the three models using measures of accuracy such as RMSE, MAE, MAPE, Theil’s U1 and U2. Seasonal decomposition and ADF unit root tests were performed to obtain new deseasonalized series and stationarity, respectively. Results offer preference for the use of ARIMA in forecasting, having performed better than VAR and exponential smoothing in all scenarios. Additionally, VAR model provided better forecast accuracy than exponential smoothing on all measures of accuracy except on Thiel’s U2 whose VAR values were not computed. Cautionary use of ARIMA for forecasting is recommended.


INTRODUCTION
Tracking the overtime evolutionary path of economic variables and making forward projections help policymakers in setting, predicting and achieving both microeconomics and macroeconomic targets. This process is achieved through various univariate and multivariate forecasting methods such as the Box-Jenkins' ARIMA, exponential smoothing and Vector Autoregressive (VAR) models. ARIMA is a method that makes use of the past observations of a series for modeling. Its philosophy is based on the fact the data is suitable enough to provide important insights for future projections by following the stochastic elements of the series. The AR term in the ARIMA process represents the dependence of the series on its previous values while the MA part represents the dependence of the series on random disturbances; each with the additional error term. The I component indicates the integral number of the series, which is the number of times a series has to be differenced to be stationary. On the other hand, the VAR model is linear models that capture the joint dynamics of multiple series by taking each endogenous variable as a lagged function of all endogenous variables in the system (Sims, 1980). They are famously used for forecasting and structural analysis. Additionally, Exponential smoothing is a method that allocate weights to different series to account for fluctuations in the data. In this method, forecasts are obtained by smoothing past values of series in an exponential process that decays over the mean of the data.
Forecasting is a very crucial process as it provides information to policymakers on the best choices available among competing options. In business operations and in multinational companies, accurate forecasting of economic variables is crucial since it improves their overall profitability and provides useful understanding of risks involved. In general, forecasting economic variables has a practical aspect in that an accurate forecast can provide valuable information to the investors, government authorities and policymakers for use in the allocation of assets, in hedging risk and in policy formulation (Tindaon, 2015).
However, discrepancies arise when researchers have to choose between conflicting models that show phenomenal success in the field of forecasting science. This paper, thus, makes use of quarterly data series of the Turkish Private Consumption, Exports and GDP data ranging from 1998 Q1-2017 Q4 to undertake a comparative evaluation of the widely used time series forecasting methods of ARIMA, Exponential smoothing, and VAR to choose the best method to follow. Models are compared based on symmetric lost function. The next sections discuss literature, data, methodology, results, and conclusions.

LITERATURE
Several models have been compared to establish the ones which provide better forecast performance in many countries using various variables. The most common variables used are exchange rates, inflation, and interest rates among others. Most recent studies include Mojekwu et al. (2011) who carried out a study based on Nigeria foreign exchange rates series between 1974 and 2008 using AR (1), ARIMA (1,1,1), MA (1), IMA (1,1) and SARIMA (0,0,0)(1,1,1)12. He also used Simple, Double, Linear, Dampedtrend, Linear and Seasonal Exponential Smoothing in addition to Winters Methods with the aim of establishing the efficiency and stability of the exchange market of Central Bank of Nigeria. They conclude that the exchange market of Nigeria is the most stable, while Bureau de changes and interbank exchange rates to the US dollar fluctuate over the period under investigation with the use of SARIMA (0,0,0) (1,1,1)12. Kadilar et al. (2009) compared the performance of ANN, ARIMA and ARCH models using exchange rates series of Turkey ranging from January 2005 to January 2008. His findings show that ANN is more accurate than ARIMA and ARCH models. Using the same variable, Fat and Dezsi (2011) established that Romanian Leu is appreciating against other currencies in a study he adopted on the series of Romania Leu between 3 January 2011 and 22 April 2011. He used exponential smoothing and ARIMA models, and the results provided little preference for ARIMA. Additionally, Khashif et al. (2008) carried out a study using series between July 2001 and June 2007 in Pakistan to determine which model performs better among ARIMA, GARCH, and State Space Models. His results state that The State Space model provides the best performance among all the models. A similar study based also established preference for the GARCH (1, 2) model. Ramzan et al. (2012) used ARMA, ARCH and GARCH models to model and forecast volatility and also found the same model to be best in removing persistence in volatility. On the other hand, EGARCH (1, 2) successfully overcame the leverage effect in the exchange rate returns. Similar studies done by Erkekoglu et al. (2020) in Uganda recommend the use of PARCH (1,1) and EGARCH (1,1) in modeling and forecasting volatility. Nanayakkara et al. (2014) carried out a study based on Sri Lanka using exchange rates data from January 2007 to November 2011. They examined and compared GARCH and Neural Network approaches. Their findings revealed that ANN performs better than GARCH models, similar to those of Kasheri and Bijari (2011) and Kihoro et al (2006). In India, Biswajit (2015) found that the random walk model outperforms ARIMA and ARCH/GARCH models for an exchange rate series running from August 1994 to April 2014. Finally, VAR (1) generated the most accurate forecasts during a 1 -month horizon, while the ARIMA (1, 1, 0) is the more suitable model during a 3 -month horizon in study carried out in Sweden by Varenius (2017) on exchange rate series between the periods of January 2000 and December 2015.

Data
This paper employs quarterly GDP, Exports and Private Consumption data for the Turkish economy between 1998Q1: 2017Q2. The variables represent key subcomponents of OECD's main economic indicators and the series are long enough to conduct a study in forecasting. The series were obtained from the electronic data portal of The Republic of Turkey's Central Bank (TCMB), "http://www.tcmb.gov.tr".

Seasonality
Time series data is an agglomeration of various components such as trend (Tr), seasonal variations (Sn), cycle (C1) and an error term (e t ). Trend is the tendency of the data to move upwards for a long period while seasonal variations are yearly patterns that keep recurring in specific periods within the data. Cycles denote patterns that repeatedly occur beyond two years while error terms are erratic movements in the data (Kirkpatrick et al., 1993). It's a fundamental best practice to work with deseasonalized series obtained through the decomposition of series. Data components of the multiplicative decomposition model discussed above are represented as follows: The deseasonalized series (d t = Y t −Sn t ) obtained involve obtaining moving averages, central moving averages, error-tainted seasonal values, error-free seasonal values and finally deseasonalized series.

Stationarity
As a prerequisite for time series modeling, unit root analysis is performed to ensure that the underlying series is stationary. A series is stationary when its mean and variance are constant over time and the value of the covariance between two periods depends only on the gap or lag between the two periods and not on the actual time at which the covariance is computed (Brockwell and Davis, 1996). Since there is difficulty in establishing full stationarity, the literature only focuses on weak stationarity conditioned on the following: Mean: E(Y t ) =µ for all t (constant mean in all periods) Variance: var (Y t = E(Y t −µ) 2 = σ 2 (Variance is same in all periods) ] (autocovariance of time series concerning a particular lag is the same at every period).
Stationarity is established through graphical methods, correlogram and unit root tests (Priestley, 1981). This study considers graphical and unit root methods to establish stationarity.

Graphical analysis test of stationarity
Graphical analysis is the starting point for examining stationarity as it provides the visual clue about the nature of the series. Plots of the initial private consumption, exports and GDP series show upward trends in all three series suggesting the variance of the mean over time; a simple indication of nonstationary series. However, after differencing the series, resultant plots show that the series fluctuate around the mean with no trend; suggestive of stationarity as indicated in Figure 1 in the Appendices.

Unit root analysis for stationarity
Unit root analysis is the most popular tool for examining stationarity/nonstationary of the data. Although there are numerous methods to test stationarity, this study adopts the Augmented Dickey-Fuller (ADF) test, which is an augmented form of the traditional Dickey-Fuller. The idea of the ADF is that the following regression equation is estimated to determine if δ=0.
where ε t is the pure white noise term, Δ is the first difference operator, δ=ρ-1, and unit root test was performed on all the three series at various levels-intercept, trend and intercept and none.
The results displayed in Table 1 show ADF tests for all the three series considered. The results indicate that at levels, we cannot reject the null hypothesis for the presence of unit root in the series considered. This suggests that the series is nonstationary. However, once the series were differenced, we reject the null hypothesis and accept the alternative implying that the series are stationary after differencing.

Forecasting with Box-Jenkins ARIMA Methodology
The Box-Jenkins ARIMA Methods are atheoretic models built on the philosophy that time-series data is suitable enough to provide important insights for future projections by following the stochastic elements of the series Lee (1994). The AR component denotes the dependence of the series on its previous values while the MA part represents the dependence of the series on random disturbances each with the additional error term. The I component indicates the integral number of the series, which is the number of times a series has to be differenced to be stationary. 0.0000*** −9.492875 0.0000*** ***, **, * imply significance at the <1% <5% and <10* levels respectively. is the constant and is the trend AR (1) is represented in the following equation: In addition, MA (1) takes the following form: Altogether, ARMA (1,1) is expressed as follows: where Y t e.g. private consumption is dependent on its previous values up to the p th lag plus a constant α and a random disturbance e t .Y t in the equation above is a combination of both AR and MA components. Formally, the process is presented as ARIMA (p, d, q) where p, d, and q represent lag orders for AR, Differencing and MA process respectively. The specific assumption made for e t is that they are identically and independently distributed with a common mean and common variance σ 2 in all observations or simply ∼i.i.d.(0,σ 2 ) implying the freedom of the disturbances from autocorrelation and heteroscedasticity. This is called the white-noise assumption.
The BJ methodology involves four major steps: identification, estimation, diagnostic checking, and forecasting. In the identification process, values of p, d, and q are specified using correlogram and partial correlogram. Estimation is then made commonly through the Maximum Likelihood method or OLS using the specified values. Diagnostic checking is performed by checking whether residuals are white noise before forecasting is carried out (Box and Jenkins, 1976).

BJ model identification using ACF and PACF
Model identification is the first procedure done in Box-Jenkins (BJ) methodology to identify the appropriate p, d, and q values. This is done through autocorrelation function, partial autocorrelation function, and the respective correlograms. The choice for the appropriate model is based on the following guidelines provided in Table 2. Figure 2 in the appendices provides the results of the ACF and PACF, and the corresponding correlograms for all the series considered. It is observed that the ACF of private consumption series is significant at the first lag while the PACF exponentially dies down. This suggests an MA (1) process. The ACF and PACF of the export series are significant with spikes at the 5 th lags, suggestive of probable MA (5) process. Finally, the results for the GDP series are similar to those of the private consumption series since they have significant spikes for ACF at the first lag and an exponential decay of the PACF, indicating an MA (1) process.

BJ models estimation
Once the models have been identified, the next BJ process requires an estimation of the identified models. Our tentative models were of MA (1), MA (5) and MA (1) for the private consumption, exports and GDP series, respectively. The following results in Table 3 are obtained from our estimation.
The estimation output above are significant results at <1% for all the models.

BJ models diagnostics
To establish whether the model is reasonable enough to fit the data, diagnostics are performed. This is done by obtaining the residuals with their respective ACF and PACF and establishing whether they are white noise. Residual graphs for the series used in this case are presented in Figure 3 in the appendices section. As observed, the residuals have random movements reflecting the white noise nature. Therefore, our chosen models are appropriate for the study.

BJ step 4: forecasting
Forecasting is the final procedure in BJ methodology. It's done to make projections into the future for the values of the series. This study made an ex-post forecast for the period 2013q1-2017q4. A comparison of the real and the forecast graphs for the three series between the periods 2013q1 and 2017q4 is provided in Figure 4. As can be seen, the fit is perfect, especially during the early periods for all series except for exports in the later periods.

The Vector Autoregressive Model (VAR)
The VAR models are a seminal work of Chris Sims derived from his famous critique of large-scale traditional macro-econometric models. They are essentially multivariate linear time series models that capture the joint dynamics of multiple time series by treating each endogenous variable as a lagged function of all endogenous variables in the system (Sims, 1980). They are famously used for forecasting and structural analysis. The VAR process starts with the specification and estimation of reduced-form VAR to model diagnostics. Once models satisfy the diagnostics requirement, they are better used for forecasting or structural analysis to conduct impulse response analysis and forecast error variance decomposition. For instance, consider y t vector at time t constituted by n variables. y t = [y 1,t y 2,t …y n,t ] ' y t = G 0 + G 1 y 1−1 …G 2t-2 + G p y t−p ) + e t where: G 0 = (n*1) vector of constants G 1 = (n*n) vector of coefficients e t = (n*1) vector of white noise innovations White noise innovations are considered to be serially uncorrelated with zero and finite variance and have the following variance-    y 1,t = g 11t y 1,t−1 + g 12 y 2,t−1 + e 1,t (6b) y 2,t = g 11t y 1,t−1 + g 12 y 2,t−1 + e 1,t (6c) VAR models equally have matrix representations. For instance, one-lagged matrix notation for the variables considered in this study can be represented as follows: where the left-hand side matrix is a vector of autoregressive processes while those on the left represent matrices of coefficients, one-period AR processes, and white noise innovations respectively. Previously stated assumptions hold for the white noise innovations.

VAR estimation and specification
VARs are estimated using consistent and efficient OLS estimators executed equation by equation. In specifying the VAR model, endogenous variables are selected based on economic theory or empirical justifications. Non-stationary data is usually transformed and other exogenous variables such as constant, time trend and dummy variables may also be included in the model while taking note of the parsimonious requirement for the model. The stationarity of the VAR model is accessed by checking if the roots of the characteristic AR polynomial are invertible. After the initial estimation of the VAR model, the lag section is performed based on various information criteria which trade-off parsimony and reduction in the sum of squares. Initial lag selection is based on the rule of thumb with p = 1, p = 4 and p = 12 for annual, quarterly and monthly data, respectively. After choosing an appropriate lag using various information criteria such as Akaike, Schwarz, and Hanna-Quinn, estimation and diagnostics are performed, after which forecasting is carried out using the newly estimated VAR model. These procedures are keenly observed in this study as follows.

VAR specification: Lag selection
Lag selection is a crucial element of the VAR specification.
The following in Table 4 are the results obtained from the lag specification procedure in our VAR model. It can be observed the 7 th lag is the most preferable lag having been chosen by more criteria (Final prediction error and Akaike information criterion).

VAR model diagnostics: stationarity
We estimate the stability of the VAR model using the inverse roots of the characteristic polynomial. Our results for this procedure are presented in Figure 5 in the appendices and they show that the inverse roots of the characteristic polynomial lay within the unit cycle, suggesting that our VAR model is stationary.

Residual diagnostics: Residual graphs
To ensure whether our stationary VAR is correctly specified, we perform residual diagnostics by obtaining the residual graphs of our VAR model presented in the Figure in 6. As it can be inferred, the residuals have random movements reflecting their white noise nature. Therefore, our chosen VAR model is correctly specified.

VAR forecasting
Forecasting is one of the main applications of the VAR model. We estimated the VAR model and obtained forecast values. Figure 7 in the appendices shows a comparison of the real and the forecast graphs for the three series between the periods 2013q1 and 2017q4. As shown, the fit looks plausible, especially for the private consumption and export series.

Exponential Smoothing
Exponential smoothing is an initial empirical work of Holt (1957) and Brown (1959) on forecasting models for inventory control systems. This method assigns greater weight to the most recent series to make up for the latest fluctuations in the data. Forecast values are obtained by smoothing past values of series in an exponential process that decays over the mean of the data.
Smoothing parameters assigned offer density to each observation.
The most widely used types of exponential smoothing are Simple Exponential Smoothing (SES) and Holt-Winters' methods which are used on stationary data and series with seasonal components, respectively. This study adopts SES. The model can be expressed in form of the following equation: Exponential smoothing methods draw their popularity from their easiness to compute, accuracy and simplicity (Ostertagova and Ostertag, 2012). The comparison of the real and the forecast graphs for the three series between the periods 2013q1 and 2017q4 using exponential smoothing is presented in Figure 8. It can be observed that forecast values plausibly fit the actual values for all the series.

Forecast Evaluation
Given the presence of many forecasting methods, researchers are treated to dilemma given the absence of a specific yardstick to compare the competing models. However, the consensus is the preference of models that ensure accuracy by minimizing forecast errors associated with each model, Ryu and Sanchez (2003).
Forecast accuracy is generally measured using the mean square error (MSE), root mean square error (RMSE), mean absolute percentage error (MAPE), Thiel's U 1 and Thiel's U 2 statistics.
where A i represents the actual observations and P i the corresponding predictions for the case of U 1 whereas in Thiel U 2 they represent proposed U 2 as a pair of predicted and observed changes. Perfect forecasts are those with U 1 closer to the 0 bound while worst forecasts are those closer to 1. U 2 can be interpreted as the RMSE of the proposed forecasting model divided by the RMSE of a nochange model. U 2 values lower than 1.0 show an improvement over the simple no-change forecast.

Forecast Model Selection Criteria
From Table 5 above, considering the RMSE and MAE, ARIMA model provides better performance compared to VAR and Exponential smoothing in an inter-model analysis. We observe that the respective RMSE and MAE of the ARIMA model (0.038945 and 0.029059 for dln_consa, 0.079034 and 0.067008 for dln_expsa and 0.045760 and 0.030414 for dln_gdpsa) are much lower than the respective RMSE and MAE values for the VAR (0.046662 and 0.033823 for dln_consa, 0.090192 and 0.065262 for dln_expsa and 0.047286 and 0.034870 for dln_gdpsa) and exponential smoothing (0.067013 and 0.047326 for dln_consa, 0.101399 and 0.069019 for dln_expsa and 0.076805 and 0.052021 for dln_gdpsa). This implies that ARIMA minimizes the seriousness and dispersion of the forecast errors and provides better goodness of fit for the forecast as compared to other models based. Additionally, ARIMA better minimizes percentage errors and provides better forecast accuracy as indicated by lower MAPE and Thiel U 1 respectively. Although Thiel's U 2 values for VAR were not computed, ARIMA still performs better than exponential smoothing by recording lower values which show improvement over the naïve forecast. Although ARIMA, outperforms all the models in all scenarios, VAR has also shown consistent better performance over exponential smoothing over the three series.

CONCLUSION
ARIMA provides consistent better performance over VAR and Exponential and Exponential smoothing in all measures of accuracy. This implies that it minimizes absolute, percentage and mean deviation of errors from the original values. It also performs better on a normalized scale vindicated by its lower values of the Thiel's U 1 . VAR follows ARIMA in better performance in all measures of accuracy, although its Thiel's U 2 measures couldn't be computed. These findings are in contrast to Bilgili (2002) which established a preference for a combination of VAR-ES over ARIMA, Combining and Addfactor method.
Therefore, ARIMA provides better forecast performance than VAR and Exponential smoothing on the latest and expanded series of crucial variables of the Turkish economy.