Artigo_Trafego_CBA14

download Artigo_Trafego_CBA14

of 7

Transcript of Artigo_Trafego_CBA14

  • 8/11/2019 Artigo_Trafego_CBA14

    1/7

    MODELS ON ROAD TRAFFIC FORECASTING:IDENTIFICATION AND DISCUSSION OF DIFFERENT TIME SERIES

    MODELS

    FERNANDO FERNANDESNETO

    Instituto de Pesquisas Tecnolgicas do Estado de So Paulo IPT / Secretaria do Planejamento e

    Desenvolvimento Regional do Estado de So PauloPalcio dos Bandeirantes - Av. Morumbi, 4500, 1 Andar, Sala 142, Morumbi, So Paulo/SP

    E-mails: [email protected] /[email protected]

    CLAUDIO GARCIA

    Escola Politcnica da Universidade de So Paulo Departamento de Engenharia de Telecomunicaes e

    Controle

    Avenida Prof. Luciano Gualberto, trav. 3, 158, Butant, So Paulo/SP, Brasil - 05508900

    E-mails: [email protected]

    AbstractIn this paper are discussed and calibrated univariate models (scalar approach, SARIMAX) and multivariate models

    (vector approach, VAR and VEC) aiming traffic forecasts of equivalent axles in the Anchieta-Imigrantes system. The best per-formance models in the backtesting procedure were those of the second type (vector), having a mean absolute error of approxi-mately 3%, in a monthly frequency.

    KeywordsVAR, VEC, ARIMA, SARIMA, identification, time series, toll roads

    Resumo Neste artigo so discutidos e calibrados modelos univariados (abordagem escalar, SARIMAX) e multivariados(abordagem vetorial, VAR e VEC) para a previso de trfego em eixos equivalentes no sistema Anchieta-Imigrantes. Os mode-los que tiveram melhor desempenho no backtesting foram os do segundo tipo (vetorial), tendo erro mdio absoluto de aproxima-damente 3% em uma frequncia mensal.

    Palavras-chaveVAR, VEC, ARIMA, SARIMA, identificao, sries temporais, rodovias

    1 Introduction

    One of the main problems in the toll road sector isthe cash flow planning and its forecasting, due to itsidiosyncratic complexity, e.g. levels of service, sea-sonal effects and the inertial evolution of the traffic;and to the impact of other variables like the GrossDomestic Product.

    There is a wide range of methods applied to trafficforecasting, from Time Series models, Kalman Filterbased models, Neural Networks; to Markov Chainmodels, simulation models and linear regressionmodels, as shown by Bolshinsky and Freidman

    (2012),or a combination of them according to Filla-tre et al.(2005), varying from high-frequency to low-frequency data.

    Also, it is important to notice that despite the richexisting literature on traffic forecasting, little atten-tion has been paid to the prediction ability of most ofthese methods, as can be seen in (Bain, 2009). Infact, there is a considerable error range in the U.S.traffic forecasts, as pointed by the same author:

    actual traffic turned out to lie between

    86% below forecast to 51% above forecast. This con-

    siderable error range illustrates the possible magni-tude of uncertainty when traffic risk is passed to the

    private sector.

    Hence, planning and forecasting play a fundamen-tal role in this field, in the sense that most of the nec-essary investments and, consequently, their respectivedecision-makings and cash outflows, must take intoaccount a very long timeline conception, construc-tion, maturation of the project until plain capacity,etc.

    Thus, the main goal of this paper is the discussionof an alternative traffic forecasting method in tollroads in this case Vectorial Autoregressive models(namely VAR and VEC) and Univariate time seriesbased on Seasonal ARIMAX models, discussed inthe next session illustrating one of the most im-portant highway systems in Brazil, the Anchieta-Imigrantes System.

    This paper is divided into the following sections:introduction, methodology, presentation of the prob-lem, results, analysis of the results and conclusion.

  • 8/11/2019 Artigo_Trafego_CBA14

    2/7

    2 Methodology

    2.1 Univariate Models

    The Univariate approach in the present paper is basedon Seasonal ARIMAX models, which are a naturalextension to the classical ARIMAX models, which isa product of two ARIMAX polynomials, one with the

    regular structure of the time series, and the other onewith the seasonal structure of the time series, as canbe seen in (Morettin and Toli, 2004; Box and Jen-kins, 1978 and Hamilton 1994).

    2.2 Multivariate Models

    The Multivariate Models are mainly based on Vec-tor Autoregression models. These are nothing morethan a multivariable extension of the classical scalarauto regression models (AR), in the sense that theprocess is described in terms of matrices and vectors,instead of scalars. Thus, there is a mutual causality

    relationship between all variables in this dynamicsystem. For example, a VAR(p) process can be writ-ten as:

    Yt= 1Yt1+2Yt2+ ...+pYtp+at (1)

    where the iterms are square matrices of order n;

    Ytn

    are 1 x n vectors of endogenous variables; at

    is a 1 x n vector of uncorrelated residuals; n isthe endogenous variable number and p is the num-ber of lags.

    In addition to that, as the classical scalar auto re-

    gression models (AR), if all variables are stationary,this model can be estimated using the Ordinary LeastSquares (OLS) method. On the other hand, when oneor more variables in VAR models are non-stationary,the OLS results may be not valid anymore. Conse-quently, the Theory of Cointegration was developedin order to analyze these possible relationships be-tween non-stationary time series.

    Furthermore, Granger and Newbold (1974) dis-cussed and exposed the problems of spurious regres-sions over non-stationary time series. They also veri-fied that given two series completely uncorrelated,and non-stationary, the regression between them may

    produce a significant apparent relationship.Therefore, if two variables are non-stationary andhave a long-run equilibrium relationship, they may becointegrated that is, both are uncorrelated, non-stationary, but with a relationship between them asexposed by Engle and Granger (1987), Ashley andGranger (1979) and Johansen (1988).

    Thus Vector Error Correction Models (VEC) weredeveloped, where they can be seen as extensions toVAR according to Hendry and Juselius (2000, 2001)and Ltkepohl (1991), where it is introduced an error

    correction term.In order to verify the cointegration assumption, in

    the current paper the approach that was made is theverification that all variables are non-stationary, us-ing the Augmented Dickey-Fuller (1979) test, using a95% confidence interval; then if and only if thevariables are non-stationary following Engle and

    Granger (1987), the cointegration residuals are ob-tained by running a regression over the variables andthese residuals are tested for stationarity. If theseresiduals are stationary (tested using the AugmentedDickey-Fuller test again) the time series are cointe-grated, otherwise they are not cointegrated.

    In order to explain how the VEC model structure isobtained, one can start from a two variable dynamicsystem, where both are cointegrated (by hypothesis),following (Hendry and Juselius, 2000, 2001; Lt-kepohl, 1991, 2004 and Morettin, 2011).

    Be Y1,t

    and Y2,t two non-stationary cointegrated

    variables, and assume that there is an equilibrium

    relation between them given by:

    Y1,tY2,t= t ~N(0,) (2)

    And if considered that the variations in Y1,t

    and

    Y2,tdepend on the deviations of this equilibrium in t-

    1, it follows that:

    Y1,t=1(Y1,t1Y2,t1)+a1,t :a1,t ~ N(0,1)

    (3.1)

    Y2,t=2(Y1,t1Y2,t1)+a2,t :a2,t ~ N(0,2 )

    (3.2)

    One can generalize this error correction model intoa more general form, where these corrections in theequilibrium may depend on previous changes in theequilibrium due to possible autocorrelations, like:

    Y1,t= 1(Y1,t1Y2,t1)+1,1Y1,t1+1,2Y2,t1+a1,t:a1,t ~N(0,1)

    (4.1)

    Y2,t= 2(Y1,t1Y2,t1)+2,1Y1,t1+2,2Y2,t1+a2,t

    :a2,t ~N(0,2 )(4.2)

    where this model actually is a VAR(1) model. In or-der to verify that, one can simply put these pair ofequations into matrix form, resulting in:

    Yt= 'Y

    t1+AYt1+at (5)

  • 8/11/2019 Artigo_Trafego_CBA14

    3/7

    where:

    = 1

    2

    , '= 1

    ,A=

    1,1 1,2

    2,1 2,2

    (6)

    or rewriting as:

    Yt= ' +A+ I( )Yt1AYt2+at (7)

    Actually, according to Gujarati et al. (2011) suchrelationship can be generalized and guaranteed by theGranger Representation Theorem, that shows that anyVAR(p) can be written as a VEC(q) and vice-versa.

    Depending on the autocorrelation structure, onemight find interesting having a VEC(q) model and itsrespective VAR(p). More details can be found in(Greene, 2005).

    3 Presentation of the Problem

    In this paper, it is considered a VAR and a VECmodel with the following variables: traffic and GrossDomestic Product all of them endogenous, and twokinds of univariate SARIMAX models, one with aseasonal difference plus an stochastic seasonal shock,and another one with an autoregressive seasonal term.The GDP is available at IPEA (Instituto dePesquisas Econmicas Aplicadas Brazilian Insti-tute of Applied Economic Research) site, while theother series are publicly available upon request to

    ARTESP Transportation Regulatory Agency of SoPaulo State, Brazil (Agncia Reguladora de Trans-portes do Estado de So Paulo). The time seriesencompasses observations from March 31st, 1998until July 31st, 2013. The last six observations are leftto test the prevision accuracy of the model.

    In addition to that, it is possible to point out as amain concern the fact that considering the Gross Do-mestic Product as an endogenous variable may becounter-intuitive. However, it is known that trafficcan act as a leading indicator for the GDP behavior,and actually, such assumption is tested in this paper,through the verification of cointegration between

    both variables.The traffic was normalized under an equivalentvehicle base, in order to transform different types ofvehicles in cars, e.g. a heavy truck is equivalent ton cars, while a light truck is equivalent to n-2cars.

    The Seasonality in the vector models was consid-ered by including a vector of dummy variables, sincethe data is on a monthly basis.

    Then, having all the time series normalized, con-sidered the seasonal effects, the rank of cointegration

    and the number of lags must be established.In this case, the rank of cointegration is the number

    of cointegrating vectors which is tested accordingto (Johansen, 1988) and the least Information Criteri-on number determines the number of lags, in bothunivariate and multivariate models, as suggested in(Ltkepohl and Krtzig, 2004). For multivariate

    models, Bayesian Information Criterion was chosen,due to the fact that it imposes stronger penalties forthe inclusion of new parameters, as this kind of mod-el naturally happens to have a larger number of pa-rameters. On the other hand, for univariate models,Akaike Information Criterion was used, due to thefact that these models generally have less parametersthan the multivariate ones.

    The estimation of the parameters and all tests men-tioned are computed using GRETL Gnu Regres-sion, Econometrics and Time Library (for multivari-ate models) and R (univariate models).

    4 Results

    In Table 1, the results of the Bayesian InformationCriteria lag-search for multivariate models.

    Table 1. Bayesian Information Criterion of the Lag Search

    *

    So, as can be seen in this table, the multivariatemodels must have only one lag.

    For the univariate models, it was tested down forthe most common lag compositions over shocks andautoregressive terms, according to the auto.arimafunction, provided in forecast package, within theR statistical software, to check the optimal ARIMAregular structure. It resulted in an ARIMA polynomi-al of the form ARIMA (p=1, d=1, q=4).

    Then, the two most usual seasonal polynomials

    were calibrated, SARIMA (p=1, d=0, q=0) andSARIMA (p=0, d=1, q=1).

    The Rank of cointegration was determined accord-ing to the Johansen test (1988), and for a null rankmatrix (null hypothesis), there is a p-value of 0.03.So, the statistical evidence points out that there is nocointegrating relationship between the variables. De-spite that, in this paper the VEC model was still esti-mated for comparison purposes.

    Thus, 4 different models were obtained as follows.

  • 8/11/2019 Artigo_Trafego_CBA14

    4/7

    Seasonal Model with Seasonal Difference:

    34710.72a0.6753-

    a0.5514-a0.0978-

    a0.2215-a0.0447

    4864.0

    12-t

    4-t3-t

    2-t1-t

    112

    +

    +

    = ttt YYY

    (8)

    Seasonal Model with Autoregressive Seasonal com-ponents:

    039.267910.8141

    a0.5641-a0.1227-

    a0.2902-a0.0231

    5280,0

    12

    4-t3-t

    2-t1-t

    1

    ++

    =

    t

    tt

    Y

    YY

    (9)

    VAR Model with Seasonal Dummies:

    +

    =

    2

    1

    19735000190

    7520925230

    K

    K

    PIB

    Y

    ..

    ..

    PIB

    Y

    tt (10)

    where 1K and 2K are the seasonal dummies as fol-

    lows in the table:

    Table 2. Seasonal Parameters Estimates of the VAR Model

    Thus, if the month to be predicted is January, onemust sum up the coefficient S1plus the constant, andso on according to the respective predicted month.

    Finally, the VEC model with seasonal dummies ispresented as follows.

    Y

    PIB

    t

    = 0.74791 9.7690.0019 0.0247

    Y

    PIB

    t1

    +9.769

    0.0247

    PIB0.0765 Y[ ]

    t1

    + K1

    K2

    (11)

    where 1K and 2K are the seasonal dummies as fol-

    lows in the table:

    Table 3. Seasonal Parameter Estimates of the VEC Model

    ,

    ,

    ,

    ,

    ,

    , ,

    ,

    ,

    ,

    Analysis of the Results

    Aiming the selection of the best model, the out-of-sample forecasting accuracy is measured in terms ofthe absolute error mean, as follows.

    Table 4. Out-of-sample Errors of the Models

    (,,)

    () %

    (,,)

    () %

    () %

    () %

    Thus, the very surprising result is that the VEC(1)model, that shouldnt be even estimated according tothe existing literature is the best model in terms ofout-of-sample performance. Nonetheless, it was al-ready expected that a multivariate model should per-form better than an univariate model due to the factthat more information is being included.

  • 8/11/2019 Artigo_Trafego_CBA14

    5/7

    Another interesting fact, is that the log-likelihood ofthe univariate models are far better than the multivar-iate ones, as can be seen in the following table themodel which has the least log-likelihood is the bestone.

    Table 5. Log-Likelihood of the Models

    (,,) ()

    (,,) ()

    ()

    ()

    Hence, based on these results, it seems that thebacktesting procedure is a very important part of themodeling process, since the log-likelihood estimatedoes not provide all necessary information to analyzewhich model is the best.

    When analyzing the models fitted values against theobserved values (Obs in Figure 2), it is possible tosee that it is possible to verify that SeasonalARIMAX models converge slower towards to theobserved values than the vector based models. Thishappening can be explained due the fact that theseunivariate seasonal models rely on past observedvalues to forecast the seasonal factors. On the otherhand, vector based models (Figure 1) are relying onseasonal deterministic dummy variables. Thus, de-

  • 8/11/2019 Artigo_Trafego_CBA14

    6/7

    spite past values are unknown to the autoregressivepart, there are already values being inserted in themodel, providing estimates of the seasonal fluctua-tions.

    Another interesting point is the fact that, despitehaving a larger number of variables (multivariate),they had a poorer performance within the sample, sobasically, the models which were actually overfittedwere the univariate ones.

    Finally, here it is shown the most important featureof vector models in terms of policy analysis, which isthe impulse response structure that can be retrievedof the system, following (Sims, 1980).

    This method is based on the decomposition of thecovariance matrix using a Cholesky algorithm, toobtain what is called a Structural VAR/VEC. Think-ing of the a VAR with contemporaneous relation-ships, as in the expression below:

    0Yt= 1Yt1+2Yt2+ ...+nYtn+K+ at(12)

    and multiplying the whole equation by the inverseof 0 one gets a VAR as in Equation (1) that can be

    estimated using the traditional OLS algorithm.Therefore, after decomposing the covariance matrix,

    it is possible to impose causal restrictions, in order toretrieve the contemporary relationship matrix.

    So, for example, if thought that the economy (GDP)is expected to cause the traffic in the road, one mayinfer how the dynamics between the time series maybehave with an impulse-response of the traffic againstthe GDP.

    This is a powerful tool that enables the researcher toverify dynamic effects instead of just applying a first-

    order (linear) as in the traditional simple linear re-gression over the logarithms of the variables (thisprocedure is actually called elasticity calculation).

    Figure 3. Impulse-Response of Trafego to a Shock in PIB

    As can be seen in Figure 3, a standard shock (a uni-tary shock in terms of the covariance matrix retrievedin the VAR/VEC models) in the evolution of theGDP causes an increase of 50 thousand vehicles,after 4 months and reaches stability after 5 months.

    Conclusion

    In this paper it was shown that it is possible tobuild an autoregressive multivariable model to de-scribe the traffic data in one of the most importantToll Road in Brazil, with significant seasonal effectsand a large amount of vehicles.

    Then, four kinds of models were estimated: aVAR, a VEC and two kinds of Seasonal ARIMAXmodels. Furthermore, it were discussed methodolo-gies for testing the cointegration between the varia-bles, unitary root and optimal lag structure obtention.

    Thus, it is possible to observe that both multivari-ate methodologies produced very similar forecastsbetween them, as occurred between both univariatemodels too. Despite that, both kinds of models weresignificantly different in the long-run and in the short-run, being the first kind (multivariate) the best ofthem, producing reasonable forecasts 3% meanabsolute error.

    Nonetheless, it is important to notice that this pa-per shows the usefulness of impulse-response analy-sis, which seems to be far more reasonable than thetraditional elasticity measures applied over simplelinear regression based models in policy analysis.

    As perspective for future analysis and work, it issuggested expanding this analysis to other large roadsystems in Brazil and other countries, continuing toupdate the existing database and verifying possiblestructural and parameter changes in these models,and include in this comparison the performance ofNARX models (nonlinear autoregressive models) andstandard neural-network based models, using onlyautoregressive components of the dependent variable,or evaluate the inclusion of other possible candidateindependent variables (e.g. GDP).

    References

    ASHLEY, R.A., GRANGER, C.W.J. (1979). Timeseries analysis of residuals from St. Louis model.In Journal of Macroeconomics, 1, 373-394.

    BAIN, R. (2009). Error and optimism bias in tollroad traffic forecasts, Working Paper, RePEC.

    BOLSHINSKI, E., FREIDMAN, R. (2012). Trafficflow forecast survey. Tech. rep., Technion Israel Institute of Technology.

    BOX, G.E.P., JENKINS, G.M. (1976). Times SeriesAnalysis: Forecasting and Control. 1st Edition,San Francisco Holden Day.

    DICKEY, D.A., FULLER, W.A. (1979) Distributionof the estimators for autoregressive time seireswith a unit root. In European Journal of Finance,vol. 15, p. 619-637.

    ENGLE, R.F., GRANGER, C.W.J. (1987).Cointegration and error correction:Representation, estimation and testing. InEconometrica, vol. 55, 251-276.

  • 8/11/2019 Artigo_Trafego_CBA14

    7/7

    FILLATRE, L., MARAKOV, D., VATON, S.December (2005). Forecasting Seasonal TrafficFlows. Workshop EuroNGI, Paris.

    GRANGER, C.W.J., NEWBOLD, P. (1974).Spurious Regressions in Econometrics, Journalof Econometrics, vol. 2, 111-120.

    GREENE, W.H. (2002). Econometric Analysis, 5thEdition, Upper Saddle River, New Jersey,

    Prentice Hall.GUJARATI, D.N., PORTER, D.C. (2011)

    Econometria Bsica, Editora Bookman, SoPaulo.

    HAMILTON, J.D. (1994). Time Series Analysis, 1stEdition, Princeton, New Jersey, PrincetonUniversity Press.

    HENDRY, D.F., JUSELIUS, K. (2000). ExplainingCointegration Analysis: Part 1. In The EnergyJournal, International Association for EnergyEconomics, vol. 0 (Number 1), 1-42

    HENDRY, D.F., JUSELIUS, K. (2001). ExplainingCointegration Analysis: Part 2. Em The EnergyJournal, International Association for EnergyEconomics, vol. 0 (Number 1), 75-120.

    IPEADATA, no stio http://www.ipeadata.gov.br,visitado em 01/11/2013.

    JOHANSEN, S. (1988). Statistical Analysis ofcointegration vectors. In Journal of EconomicDynamics and Control, vol. 12, 231-254.

    LTKEPOHL, H. (2004). Applied Time SeriesEconometrics, 1st Edition, New York,Cambridge University Press.

    LTKEPOHL, H. (1991). Introduction to MultipleTime Series Analysis, Heidelberg, SpringerVerlag.

    MORETTIN, P.A. (2011). Econometria Financeira:

    Um Curso em Sries Temporais Financeiras, 1Edio, So Paulo, Editora Edgar Blcher.MORETTIN, P.A., TOLI, C. (2004). Anlise de

    Sries Temporais, 1 Edio, So Paulo, EditoraEdgar Blcher.

    SCHWARZ, G. (1978). Estimating the dimension ofa model. In The Annals of Statistics, vol. 6, 461-464.

    SIMS, C. (1980). Macroeconomics and Reality. InEconometrica, vol. 48, no. 1, 1-48.