## Monthly Forecasts of Integrated Public Transport Systems: The Case of the Madrid Metropolitan Area

## Monthly Forecasts of Integrated Public Transport Systems: The Case of the Madrid Metropolitan Area

**ANTONIO GARCíA-FERRER*****ARÁNZAZU DE JUAN*****PILAR PONCELA***

Universidad Autónoma de Madrid**MARCOS BUJOSA***

Universidad Complutense de Madrid

### ABSTRACT

The Madrid public transportation system has been integrated since 1986 and includes bus, Metro, and suburban trains. This paper addresses the problem of forecasting the demand for a large number of bus and Metro tickets in the Madrid metropolitan area using monthly data from 1987 to 2002. The database is subject to several calendar effects, outliers, changing levels of service, and changing seasonality effects that further complicate the analysis and the models' forecasts. The transport agency needs estimates of all effects, as well as a forecast, of the pattern of monthly revenues and usage of the transport network. We use both traditional dynamic transfer-function causal models as well as new variants of unobserved component models estimated by least squares using automatic identification and linear techniques in the optimization on the frequency domain (BGF algorithm). Both types of models provide some interesting forecasting comparisons (using several forecast horizons that include turning points) where the pooling forecast is also used. Forecast accuracy is assessed using traditional root mean squared error and mean absolute error measures as well as variants of the Diebold-Mariano test.

### INTRODUCTION

Worldwide, traffic management is one of the main problems that large cities face. Given the costs associated with private car usage, most metropolitan transport agencies try to coordinate services, networks, and fares so as to offer consumers a higher capacity and higher quality service, with the aim of promoting public transport use and shifting demand away from private cars.

When planning transportation facilities, it is necessary to forecast how much they will be used. Also, to price them appropriately and determine the best operating policies, it is important to understand how users respond to changes in prices and service characteristics. Because the issue of price elasticities has received considerable attention in the literature (see, e.g., García-Ferrer et al. 2003; Matas and Raymond 2003, for the case of Spain), in this paper we will concentrate on alternative models to produce efficient predictions on public transportation using recent monthly data for the Madrid (Spain) region.

Public transportation in the Madrid metropolitan area encompasses four basic modes: the subway system (Metro), the municipal bus company (EMT), the RENFE suburban train service, and interurban buses. Because of data restrictions and service characteristics, the last two modes are not included in our analytical framework.^{1} *Consorcio de Transportes de Madrid* (CTM), which was created in 1986, manages the entire public transportation system. CTM coordinates the efforts of public and private institutions related to public transport. Examples of similar policies in other European countries are well documented in Pucher and Kurth (1996).

As happens in many European cities, public transport fares in the Madrid region are based on both single-mode and multiple-mode tickets, the principal one being CTM's travel card. The travel card is a multimodal pass and coupon that can be used without limit during its valid time period, being directed mainly toward regular and prepaid passengers. Fares vary by travel zone. In particular, the introduction of less expensive season tickets can provide a powerful incentive to shift transport modes and is not without theoretical support (see, e.g., Carbajo 1988; FitzRoy and Smith 1999).

At the end of 2001, 175 municipalities representing practically the entire population of the Madrid region belonged to CTM. Despite decreasing population in recent years, the number of passengers using these services has grown from 951 million in 1986 to 1,549 million in 2001. The entire public transportation system had a 4.2% increase in demand during 2001. Suburban train services experienced a 9.5% increase in demand and suburban bus use rose 6.5%. This was not only a reflection of the recent investment and coordination efforts, but also the result of a growing residential suburbanization process in the Madrid region, rail and bus service improvements, and economic incentives that travel cards provide for long-distance travelers.

Although all modes have experienced steady growth during the past few years, the Metro system showed significant increases, recording a rise in demand of 9.6% in 1999, 9.3% in 2000, and 3.7% in 2001. As we discuss later, this growth has been fueled primarily by a large increase in Metro services in terms of the number of lines, stations, and rolling stock. In particular, the Metro route length grew 43% during 1998 and 1999, and the recently inaugurated 2003 *Metro-Sur Circle Line* added 36% more capacity that is expected to affect demand in the near future.

In this paper, we focus on monthly forecasts of the number of tickets/cards sold. Monthly forecasts are desirable for several reasons. First, such forecasts would allow CTM to obtain a pattern of future monthly revenues (just multiplying prices by the number of tickets/cards sold), which are lower, for instance, in summer months, and thus plan how much cash it will need in addition to monthly revenues. This target information can be gained with 1- to 12-steps-ahead forecasts in an on-time exercise. Second, we will be able to forecast the degree of usage of the transport system. For example, the frequency and size of the trains are lower in August. Again, for this purpose, monthly forecasts are needed. Third, by using causal models with monthly data, we are able to evaluate the effect of the number of working days within a month and the effect of Easter, as well as any other exogenous effects. Overall, short- and medium-term forecasts are our main target, which will be evaluated through 1-, 6-, 12-, and 24-steps-ahead forecasts.

The paper is organized as follows: the next section presents the characteristics of the database and tests for the presence of many outliers that create considerable estimation instability and complicate the posterior forecasting exercise. We next describe the theoretical framework and present the estimation results, after which we analyze the predictive performance of alternative models based on several forecasting horizons and different predictive accuracy criteria. The issue of optimal forecast combinations is also addressed. Finally, the last section provides our conclusions.

### THE DATABASE

Our database includes CTM monthly data from January 1987 to December 2002 for the main public transport variables:

single-trip Metro tickets (SMT),

10-trip Metro tickets (10MT),

single trip bus tickets (SBT),

10-trip bus tickets (10BT),

regular travel card (TC), and

junior travel card (JTC).

The data are the number of tickets/cards sold in each category. Therefore, while the number of single tickets sold equals the number of trips, we do not know how many trips a holder of a travel card actually took. Definitions of the variables are given in table 1, and their plots (in logs) are shown in figure 1, where both nonstationarity as well as strong seasonality are clearly evident. The aberrant observations in both SBT and 10BT in the early months of 1992 correspond to a severe general strike on the bus network that took place during that period. Table 1 also includes the definition of the price variables that will be used later in the causal models. The descriptive statistics of the variables are given in table 2.

The introduction of travel cards in 1987 caused a drastic change in the trends of use for the remaining tickets. This process has been the logical consequence of a price policy that penalizes the users of single- and 10-trip tickets through large price increases, while holding the price of travel cards constant from their introduction in 1987 until 1992. During this period, for instance, single-ticket prices increased 130% and, although this price trend has been reversed recently, regular users found multimodal travel cards much more advantageous when purchasing tickets. As a result, the demand shares have shifted dramatically during the last 15 years. Single-ticket use fell to just over 4% for both Metro and EMT, while the market share of 10-ride tickets dropped significantly among bus users and remained almost unchanged in the case of Metro.^{2}

Finally, the market share for travel cards in 2002 was over 60%, which was about the figure that CTM had in mind when the integrated system was established in 1986. In this regard, recent empirical studies (Garcia-Ferrer et al. 2003) have shown that, with the exception of travel cards, the remaining tickets show significant negative own-price elasticities and moderate estimated cross-elasticities, indicating that there is room for alternative pricing policies aimed at having positive effects on demand while minimizing the negative effects on revenues.

### METHODOLOGIES AND ESTIMATION RESULTS

The plots of the main variables in figure 1 indicate that the statistical characteristics of such series changed considerably over the sample interval, so that the series can be considered nonstationary in a statistical sense. All series exhibited clear upward or downward trends, together with pronounced annual periodicity. This trend behavior is a classic example of a local mean value changing markedly over time. The nature of the seasonality varied over the six series but, in general, signs of changing seasonality patterns in the trend were evident.

These various kinds of nonstationarity are indicative of changes in the underlying statistical properties of the data. Therefore, we decided to use two alternative statistical approaches capable of characterizing the nonstationary features in an acceptable manner. One is the well-known Dynamic Transfer Function Causal Model obtained using the Intervention ARIMA (IARIMA) model developed by Box and Tiao (1975) as a starting model. The second alternative is the Dynamic Harmonic Regression (DHR) model developed by Young et al. (1999). For the latter, however, both the identification and estimation stages were carried out using the Bujosa-García-Ferrer (BGF) algorithm implemented in Bujosa et al. (2002), which is summarized in the appendix. Although the previous approaches are, basically, univariate alternatives, they also allow for the possibility of including exogenous inputs associated with intervention effects (e.g., strikes, working days, Easter effects) as well as price and service changes in the system.

#### Dynamic Transfer Function Causal Model

When using annual or quarterly data, demand equations can be based on causal models where the demand for transport services can be assumed to depend on the attributes of each mode (monetary costs and quality of service), the competing modes of transport, and certain socioeconomic and demographic variables (e.g., income, employment, and population). Using this approach means care must be taken when interpreting estimation results due to a large number of potential econometric problems. Issues of dynamic specification of the model, small sample size, large number of regressors (small number of degrees of freedom), and extreme multicollinearity should be seriously considered before interpreting the estimates as current elasticities. Multicollinearity in this context is a problem, especially when models are specified in levels. It is well known that multicollinearity may affect estimated standard errors and signs of the coefficients, providing misleading results. Far from being a good solution, the often-used alternative of eliminating statistically insignificant regressors may be even worse in terms of interpreting results. Let us emphasize, however, that this is not a model problem (whether our model is causal or not) but a data problem that can equally affect other modeling alternatives, as we will show later.

When using monthly data, as we did in this paper, the use of causal models is restricted by data availability. Lacking monthly income, employment, and even population data for the Madrid region, our causal model was based only on fare and service quality changes plus the corresponding deterministic intervention effects and stochastic seasonality variation. Nevertheless, when short- and medium-term forecasting is the objective, we can assume that the first set of factors are reasonably incorporated in the stochastic long-term trends of the monthly data. In the case of independent submarkets for each type of ticket, our dynamic causal model can be written as:

(1)

where *x _{it}* are the exogenous or intervention variables,

*v*(

_{i}*L*) includes the dynamic model for the

*i*-th intervention variable,

*θ*(

*L*) and Θ(

*L*) are the regular and seasonal moving average operators, and

^{s}*φ*(

*L*) and Φ(

*L*) are the regular and seasonal autoregressive operators. The process

^{s}*a*is assumed to be white noise, and ∇

_{t}^{D}and ∇

^{d}allow for seasonal and nonseasonal differencing. The usual stationarity and invertibility conditions apply for the

*AR*and

*MA*operators. As indicated by Peña (2001), most outlier specifications in the literature (additive, innovational, level shifts, and transitory changes) are particular cases of equation (1) under proper parameterizations of the polynomial operator

*v*(

_{i}*L*). Detailed definitions of the different intervention variables are included in table 3.

On the other hand, *z _{jt}* is a vector of exogenous variables that includes price and service changes of different tickets, and

*∂*(

_{j}*L*) represents the dynamic model for the

*j*-th exogenous variable. Before going into the estimation details, the following are comments regarding the components of the

*z*.

_{jt}First, rates of growth of deflated prices for all tickets are included in *z _{jt}*. It can be shown that price changes are identical for SMT and SBT, very similar for 10BT and 10MT, and exhibit a high correlation (

*ρ*= 0.81) between price changes of 10MT and TC. This severe collinearity will have important consequences in assessing posterior robustness on parameter estimation. While it is well known that multicollinearity does not have any effect on forecasting, one should be cautious when interpreting the coefficients. However, the estimation results we present here are robust to alternative model specifications, except for 10MT, as a consequence of the high multicollinearity between price changes of 10MT and TC (see García-Ferrer et al. 2003 for details).

Second, service quality can be measured using several indicators: route length, number of stations, number of trains or buses, vehicle-kilometers, etc. Since all indicators are highly collinear, we decided to use route length as the main indicator of service quality. Apart from these indicators, there are no data available on other service quality indicators.

The estimation of the models was done by exact maximum likelihood using SCA statistical software. Estimation results are shown in tables 4 and 5, but we will discuss each table separately. This leads us to the first issue related to the choice of the estimation period for each variable. Although most of the database covers the period January 1987 to December 2002, some problems preclude using this whole period as a generalized database. First, monthly data on SBT and 10BT are not available before 1988. Second, the TCs from 1987 to 1990 include information that was inconsistent with the posterior 1991 to 2001 data. Due to the gradual introduction of different travel card options from 1987 to 1990, the data generation process cannot be considered identical before and after 1990. Third, in the case of JTC, the early period 1987 to 1990 database suffers from identical problems and, consequently, was discarded.

For the Metro and bus estimates in table 4, the following results are noteworthy:

- All variables in the table are affected by the same difference operators as the endogenous variable, but prices are only affected by a ∇
_{12}seasonal difference because they are already given as rates of growth. Therefore, by integrating out the seasonal difference on both sides of each equation, the estimated coefficients can be interpreted as elasticities. - The effect of trading days is positive for all tickets, ranging from 0.7% for SBT to 1.6% for the 10BT variable. The size of the effects on the Metro tickets is very similar.
- On the contrary, the Easter effect is negative (as expected) for all tickets. However, the size of the effect is larger in the case of the 10-ride tickets (both for Metro and EMT).
- The remaining intervention variables are related either to strikes or to large increases in public transport fares. Nevertheless, effects vary among different types of tickets. In the case of SMT, for instance, we found positive and statistically significant coefficients in APR91 and FEB92, which are associated with strikes in the competing mode (bus). The negative effect in JAN91 can be explained by a Metro strike, while the mixed effects in the MAR90 coefficients correspond, respectively, to a bus strike and to a large price increase in SMT the following month, which explains why this effect does not show up in the 10MT equation. The long bus network strike seen in the FEB92 variable shows temporary fluctuations of 80.2%, 28.9%, and 12.2% in February, March, and April 1992 with respect to the previous month in the same year and an even larger change (82.7% in February and 36.3% in March) in the case of 10BT.
^{3}These figures show that fewer bus ticket purchases were only compensated by a mild increase in single-Metro ticket purchases (+9.4%) during that period, as a result of less Metro network coverage at that time. Finally, the JAN98 variable corresponds to the introduction of a new Metrobus ticket that negatively affected single-ticket sales permanently after its introduction. - The estimated coefficients associated with price increases are the corresponding elasticities. In general, users were highly sensitive to single- and 10-ride fares with own-price elasticities values ranging from 0.52 to 2.47. However, while the estimated elasticities in the cases of SMT, SBT, and 10BT proved to be robust to alternative model specifications, this was not the case for 10MT as a consequence of the high multicollinearity between price changes of 10MT and TC.
^{4}The remaining cross elasticities were in line with those obtained by García-Ferrer et al. (2003). The only significant evidence is found in the case of the two 10-trip tickets. For 10MT, positive cross effects were found for single tickets (0.76) and travel cards (2.63). Similar results were found for 10BT although the size of the cross effects was much smaller than in the Metro case: 0.22 for single tickets and 0.91 for travel cards. - The Ljung-Box statistics at 12- , 24- , and 36-month lags show no signs of residual autocorrelation in the estimated models.

For the travel card estimates, the following results are worth noting (see table 5):

- The estimation period was shorter (108 observations), given the anomalies mentioned earlier at the beginning of both series.
- For the model estimated for JTC, all variables were affected by the same difference operators as the endogenous variable, but prices were only affected by a ∇
_{12}transformation, because they were already given as rates of growth. By integrating out the seasonal difference in both sides of each equation, the estimated coefficients can be interpreted as elasticities. Nevertheless, in the TC equation, this variable was affected only by a seasonal difference and the straightforward interpretation of the coefficients as elasticities was not possible. - The effect of trading days is almost negligible and only significant for JTC. This is a logical result given the (monthly card) characteristic of these tickets.
- The Easter effect is statistically significant and has the expected negative signs for both variables. The quantitative impacts (5.3% and 7.1%) are similar to the ones observed in single- and 10-trip tickets.
- The remaining intervention variables are additive outliers that had not appeared earlier. The negative sign of APR93 is associated with service interruption and the temporary closing of certain lines. The CTM does not have any reasonable explanation for the positive coefficients associated with FEB93, MAR94, and JUL95.
- Travel card tickets (TC and JTC) were not affected either by their own price increases or by the price increases of competing tickets. This result is hardly surprising given the CTM pricing policy, which held the price of travel cards constant (during a large part of the sample) since their introduction in 1987. Among the service variables included in the causal model's information set, the Metro route length (
*MRL*) now becomes statistically significant for the two travel card variables._{i}^{5}The estimated dynamic responses of this variable indicate a long-term effect that lasts up to five months, implying a very long and smooth demand response to changes in supply. - The Ljung-Box statistics at 12- , 24- , and 36-month lags do not indicate serial correlation problems in the estimated models.

#### Dynamic Harmonic Regression Model

The DHR model developed by Young et al. (1999) belongs to the Unobserved Component (UC) type and is formulated within the state-space framework. This model is based on a spectral approach, the hypothesis of which maintains that the observed time series can be decomposed into several DHR components whose variances are concentrated around certain frequencies. This hypothesis is appropriate if the observed time series has well-defined spectral peaks, which implies that its variance is distributed around narrow frequency bands. Basically, the method attempts to: 1) identify the spectral peaks, 2) assign a DHR component to each spectral peak, 3) optimize the hyper-parameters that control the spectral fit of each component to its corresponding spectral peak, and 4) estimate the DHR components using the Kalman Filter and the Fixed Interval Smoothing (FIS) algorithms.

In the univariate case, the DHR model can be written as a special case of the univariate UC model that has the general form:

*y _{t} = T_{t} + S_{t} + e_{t}* ; t = 0, 1, 2, ..., (2)

where *y _{t}* is the observed time series,

*T*is the trend or low-frequency component,

_{t}*S*is the seasonal component, and

_{t}*e*is an irregular component normally distributed with zero mean value and variance

_{t}*σ*. These types of models have been extensively studied in the literature (see, e.g., Kitagawa 1981; Harvey 1989; and West and Harrison 1997).

^{2}_{e}Equation (2) is appropriate for dealing with economic data exhibiting pronounced trend and seasonality as is the case with the monthly variables used in this paper. When set in state-space form, each component is modeled in a manner that allows *y _{t}* to be represented as a set of discrete-time equations that are the basis for recursive state-space estimation and forecasting. Let us introduce specific representations for

*T*and

_{t }*S*.

_{t}*Trend Model*

Together with its derivative *D _{t}*, the low-frequency component

*T*can be described by the following second-order generalized random walk (GRW) model,

_{t}where *η _{t}* = [

*η*

_{1t}

*η*

_{2t}]' is a white noise vector with zero mean and covariance matrix

*. For simplicity, we assumed that*

**Q***is a diagonal matrix diag (*

**Q***q*

_{11},

*q*

_{22}), with unknown elements

*q*

_{11}and

*q*

_{22}. This GRW model subsumes as special cases:

- the very well-known random walk

(*R W*;*α*= 1,*β*=*γ*= 0,*q*_{22}= 0); -
the smooth random walk

(*S R W*; 0 <*α*< 1,*β*=*γ*= 1,*q*_{11}= 0); and -
the integrated random walk

(*I R W*;*α*=*β*=*γ*= 1,*q*_{11}= 0).

In the current example, the BGF algorithm (Bujosa et al. 2002) identifies an IRW model for all trends. In this case,

*T _{t}* =

*T*

_{t − 1}+

*D*

_{t − 1}

*D*=

_{t}*D*

_{t − 1}+

*η*

_{2t}(3)

where *T _{t}* and

*D*can be interpreted as the level and slope (derivative) of a time variable trend. The variance of

_{t}*η*

_{2t}(

*q*

_{22}) is the only unknown in equation (3) and can be estimated through the

*noise variance ratio*(NVR), the ratio between

*q*

_{22}and the variance of the irregular component

*σ*

^{2}

_{e}:

(4)

When smoothing the series with the Kalman Filter in order to estimate the trend, the *NVR _{T}* works as a smoothing parameter. Very low

*NVR*values are indicative of near deterministic linear trends. In the limit, when

_{T}*NVR*= 0, the estimated trend is linear. On the other hand, for large

_{T}*NVR*values the estimated trend mimics the original time series

_{T}*y*.

_{t}*Seasonal Model *

We assume that the seasonal component in equation (2) can be represented by a

(5)

where the regression coefficients are time variable to handle nonstationary seasonality. As in the previous trend model, time variation inand may follow any variant within the GRW framework. The BGF algorithm identifies that parameter variation in *a _{j}* and

*b*is modeled as an RW process,

_{j}.(6)

This assumption is very useful for time series with growing amplitude seasonality as the one found in most series in this paper.

The presence of important outliers, however, can easily affect both the estimation and the posterior identification and estimation stages of the DHR models. To avoid these effects, we implemented the following iterative process using our own Matlab code:

- Initial identification and estimation of DHR models was proposed.
- Using the Kalman Filter and FIS algorithms on the original series, outliers are treated as missing values and variance intervention was used (Young and Ng 1989) to handle level changes due to large variations in fares or the introduction of new types of tickets.
- New series were reconstructed where each outlier was substituted by its estimated value and the level shifts were accounted for through variance intervention.
- Finally, using the reconstructed series, we returned to step 1.

In all series in this paper, robust results were obtained after three iterations and the *NVR* estimation results, shown in table 6, correspond to the third iteration. Also, in all cases, the identification stage suggests an *IRW* trend component and an *RW* model for the seasonal component of period 12 and its harmonics (6, 4, 3, 2.4, and 2). Again, as in the causal models, the Ljung-Box statistics did not indicate any evidence of residual autocorrelation. Note that this iterative procedure was not necessary on the causal models, since specific intervention variables were properly introduced in the model's specification.

Finally, plots of the estimated trend and seasonal components for the six variables (not shown) indicate that, in all cases, the estimated *NVR _{T}* for the trends are different from zero, confirming the absence of deterministic linear trends but suggesting (apart from the breaks) very smooth long-term behavior. On the other hand, the estimated seasonal components are indicative of a clear changing seasonality pattern confirming the inappropriateness of deterministic seasonal schemes for this dataset.

### FORECASTING

A large body of literature exists on the subject of travel demand forecasts (see, e.g., Gillen 1977; Lave 1994; Baumgartner 1995; Slavin 1996; and Lythgee and Wardman 2002, among others). The motivation for these types of forecasts is self-evident, because the most likely reason for making forecasts is to assess different policies. The level of detail and the forecasting horizons change considerably among different studies as a function of the travel demand model, the characteristics of the data, and the number of input variables. Broadly speaking, these external influences can be separated into two components: those related to demographic and economic changes as well as other external variables, and those directly related to the public transportation system. When long-term planning is the goal, both components need to be predicted and both involve economic assumptions about their future behavior. However, when using monthly data and when (as in our case) short- and medium-term forecasts are the objective, we can assume that the first set of factors are reasonably incorporated in the stochastic long-term trends. As we will show later, it is the second set of factors (basically abrupt changes in travel supply and changing seasonality patterns) that is responsible for large variations in public travel demand in the Madrid metropolitan area and the presence of forecast errors during certain months.^{6}

Several periods-ahead forecasts for the six variables considered in this paper (using the causal and DHR models) were done for 2000, 2001, and 2002. The initial estimation sample ends in December 1999. We made 1-, 6-, 12-, and 24-months-ahead forecasts. We re-estimated all models adding one data point at a time and made new predictions. We proceeded in the same way until the last sample data point was available for estimation (November 2002 for 1-step-ahead forecasts, June 2002 for 6-steps-ahead forecasts, December 2001 for 12-steps-ahead forecasts, and December 2000 for 24-steps ahead forecasts). The forecasting for these periods was difficult given the large increase in Metro services over this time period, particularly during 1999. This difficulty meant a mixture of effects that influenced not only total demand but also temporary passenger shifts from Metro to EMT as a consequence of closing Metro stations and opening temporary new bus lines.

#### Measures of Forecast Accuracy

The literature repeatedly stresses the adoption of a variety of error measures (e.g., among others, Armstrong and Collopy 1992; Fildes 1992), because the selection of the best forecasting technique may depend on the choice of a particular accuracy measure. In this section, we computed different measures of forecast accuracy for each variable, model, and forecast horizon. Conflicts among them (although not desirable) only indicate different goals of alternative prediction exercises. The first measure is the root mean squared error (RMSE) defined for horizon *h* and model *i *as

where *N* is the number of forecasts, *t* is the number of observations used for estimation, *e _{i , t + h}* is the forecast error defined as true value minus the forecast produced with model

*i*(either causal or DHR).

The second measure is the mean absolute error (MAE) defined as

Also, to assess the forecast performance of both models, particularly in the medium term, we computed annual percentage errors (APE) and forecasted growth rates (FGR) and compare them with observed growth rates (OGR). Annual percentage errors for a certain year obtained with model *i* = *DHR, causal *and forecast horizon *h* are defined as

(7)

where *n* runs over the months within a year, *t* represents December of the previous year and *y*_{i, t + n|t + n - h }is the forecast obtained with model *i* for month *n* with observations up to the previous *h* months. The annual forecast growth rates are defined as

(8)

Although both RMSE and MAE are the usual yardsticks of forecast accuracy, for longer forecasting horizons (beyond the usual one-step-ahead period), their sole use may be not only inappropriate but misleading (García-Ferrer and Queralt 1997). In this case, we contend that FGR and APE become more relevant criteria. Table 7 presents the results for the RMSE and MAE, while the APE and FGRs are shown in table 8.

As might be expected, no model dominates the other under all the accuracy criteria and forecasting horizons. However, the following tentative conclusions can be drawn from these tables.

**For the aggregate MAE and RMSE criteria** (table 7), both indicators generally agree on which model performs better. As expected, both models' forecasting performance deteriorates as the forecast horizon grows. Only the causal model for 10BT shows similar MAE and RMSE values for different forecast horizons without signs of deterioration. For this variable, the DHR model deteriorates only in the 24-steps-ahead forecast horizon. When considering both criteria, all variables, and all forecast horizons (48 cases), the causal model emerges as a winner in 35 cases, while the DHR model is best in 12 cases. In one case (RMSE, *h* = 12, TC variable), the empirical results are identical for both models. However, as we will show later, some differences are not statistically significant when using the Diebold-Mariano test.

**For the APE and FGR criteria,** results change considerably among variables, forecast year, and forecasting horizons. In general, 1-step-ahead forecasts are excellent for both models, although the causal model outperforms the DHR in 24 out of 36 cases. For this forecast horizon, a large number of APE results are below 1%. The largest APE values are 2.77% for the causal model and 3.36% for the DHR model. For 6- and 12- steps-ahead forecasts, results are only computed for 2001 and 2002. In the first case (6-steps-ahead), the causal model outperforms its competitor in 15 out of 24 cases. Its median APE is 1.58%, while the one corresponding to the DHR is 2.66%. In the second case (12-steps-ahead), results indicate a similar performance (each model performing best in 12 cases), although the median APE is 2.83% for the causal model and 3.70% for the DHR model.

Finally, 24-steps-ahead results are only available for 2002. For this particular year, forecasts deteriorate considerably for SMT and SBT as a consequence of huge drops in their tickets sales during that year. Both models perform in a similar fashion, although the median APE is 8.30% for the causal model and 6.70% for the DHR case. One interesting byproduct of both APE and FGR criteria is the presence of signs that may be indicative of potential gains (by canceling errors of different signs) of combining forecasts over individual forecasts. According to table 8, for instance, there are, a priori, plausible forecasting gains for JTC (at all forecast horizons) and also for SBT, 10BT, and SMT for certain years and forecast horizons.

#### Statistical Comparison of Forecast Accuracy

In this subsection, we perform the Diebold-Mariano test (1995) to compare the forecast accuracy of the two models. (For a recent revision of this type of test see, e.g., Mariano 2002.) We will also apply its small sample version (Harvey et al. 1997). We concentrate on the mean squared error (MSE) loss function. Denote by *g*(*e _{i,t+h}*) the loss function for model

*i*= 1, 2 and by

*d*=

_{t}*g*(

*e*)

_{1,t+h}*g*(

*e*) the loss differential between the two models. There is no statistical difference between the two forecasting procedures if the expected value of the loss differential is zero. Therefore, the null hypothesis is

_{2,t+h}*H*

_{0}:

*E*(

*d*) = 0 versus the alternative

_{t}*H*

_{1}:

*E*(

*d*) ≠ 0. Diebold and Mariano (1995) use the following test statistic

_{t}(9)

where

(10)

and is a consistent estimator of *ƒ*, the asymptotic variance of under the null. We will compute as

(11)

with *M* being the lag truncation and the sample autocovariance of order * k* of the loss differential series {*d _{t}* }. The asymptotic distribution of

*DM*is

*N*(0, 1). Harvey et al. (1997) proposed a small sample modification of the previous test when

*g*is the MSE function

(12)

where now the lag truncation is taken as *M* = *h* 1 for the *h*-steps-ahead forecast (recall that errors in *h*-steps-ahead forecasts are a moving average process of order *h* 1), and Harvey et al. (1997) compared the *DM** statistic with a *t* distribution with *N* 1 degrees of freedom.

Table 9 presents the results of the two versions of Diebold and Mariano's test (*DM* and *DM**) of the squared loss function for 1-, 6-, and 12-steps-ahead forecasts. For 24-steps-ahead forecasts, the forecast errors could be distributed as *MA*(23), so we do not have enough predictions (only 13) to properly estimate the variance, and the test cannot be applied. As table 9 shows, in 12 out of 18 cases the differences between the two forecasting methods are not statistically significant. Positive significant values indicate that the DHR model outperforms the causal model, which occurs only for 12-steps-ahead forecasts in the 10MT variable. On the contrary, negative significant values indicate that the causal model outperforms the DHR model, which occurs in six cases when using the *DM* test but is reduced to three when using its modified small sample version. The overall conclusion from the application of the test is that no model systematically dominates the other. This leads us to consider forecast combinations as a possible way to improve individual forecasts.

#### Forecast Combination

Forecast combination is discussed in the literature as a useful device to improve accuracy. Several types of linear combinations of forecasts can be computed: simple averages, weights based on the precision of forecasts, and regression methods, among others (see, e.g., Diebold 2003).

Let *y*_{i, t + h|t} represent the *h*-steps-ahead forecast obtained with model *i* = *DHR*,*C* (where *C* stands for causal) with information up to time *t*. An easy way to compute a forecast combination is to average the forecasts of the two competing models

.(13)

Table 10 presents the RMSE for the DHR and causal models, as well as the combined forecast for all variables and horizons considered. From this table, we can conclude the following:

- The combined forecast does not always improve the RMSEs of the individual forecasts.
- Nevertheless, it is never worse than
*y*_{DHR, t + h|t}and*y*_{C, t + h|t}at the same time. - In half the cases (12 out of 24), the RMSE of the combined forecast is between the RMSEs of the individual models.
- In the remaining cases (12 out of 24), the combined forecast is better than or equal to the previous forecasts (DHR and causal).
- The improvement is observed mainly for
*h*= 1 (with the exception of the 10BT) and for the JTC variable, as the table 8 results suggest.

The rationale behind these results might be the following. Theoretically, for horizon 1, the forecast errors are white noise, so the probability of these errors from the two models being a different sign is 0.5. When this occurs, the forecast average might be closer to the real value, because the average compensates for overprediction by one method with underprediction in the other. For longer forecast horizons, the theoretical forecast errors behave as *MA*(*h* 1). If there is a turning point, it is possible that none of the models captures it, especially for a large *h*, and the probability of forecast errors of different signs might be lower without any possibility of compensation. Because the major gains are observed in the case of JTC, we concentrated on this variable to analyze other forecast combinations.

We analyzed two other forecasting combinations: linear regressions and variance-covariance methods. (For a recent review of these methods see, e.g., Newbold and Harvey 2002.) Consider the following regression

*y _{t + h}* =

*β*+

_{0}*β*+

_{1}y_{1, t + h | t}*β*+

_{2}y_{2, t + h | t}*e*(14)

_{t + h}where *e _{t+h}* can behave as

*MA*(

*h* 1) and let

(15)

where is the OLS estimator of *β _{i}*, and the sub-indices (1 and 2) show the two competing models. By using the regression equation (15), problems such as severe multicollinearity are possible, especially if the processes to be forecast are not stationary. It is also important to have a large enough number of forecasts in order to estimate the regression coefficients, leaving the remaining

*h*to make true ex-ante forecasts. Otherwise, the true values of the variables are used to estimate the coefficients, and the associated RMSEs are only in-sample (rather than out-of-sample) accuracy measures.

The error variance of the linear combination of the two forecasts

(16)

with 0 ≤ *λ* ≤ 1, is minimized if

(17)

where *σ *^{2}_{1} and *σ *^{2}_{2} are the variances of the forecast errors of the DHR and causal models, respectively, and *ρ* is the correlation coefficient. Note that this is the same as the regression method when *β*_{0} = 0 and *β*_{1} + *β*_{2} = 1. When the correlation coefficient between the two forecast errors is zero, *λ* can be estimated from the data as

(18)

Table 11 shows RMSE values for 1-step-ahead forecasts of JTC using both original and combined alternatives. In columns 5 and 6, we report the forecasts based on linear combinations formed with OLS coefficients as in equation (15). We tried two different alternatives: 1) using all the forecast sample to estimate the coefficients (REG in table 11), and 2) using the coefficients estimated with the data for 2000 (12 data points) to compute the linear combination forecasts for 2001 and 2002 (REG(12) in table 11). In column 7, we present the RMSE of the linear combination of the forecasts equation (16) when λ is estimated in equation (18). For comparison purposes, we also added in columns 2 to 4 the RMSE of the DHR, causal, and average forecasts. Notice that the smallest RMSE is given through the regression method REG, but recall that both REG and VAR forecasts are not truly ex-ante, because they use the future values of the variable in order to estimate the coefficients of the linear combinations. When we restrict the sample used in the regression equation (14) to the observations corresponding to 2000 and use these coefficients to compute the true ex-ante forecasts, the RMSE for 2001 and 2002 deteriorates to 0.047. We also tried to make true ex-ante forecasts using the restricted sample with the variables in first differences, but the RMSE deteriorated even more. Overall, the best forecasts were obtained by the average forecast of the DHR and causal models.

### CONCLUSIONS

The recent experience of the Madrid metropolitan area shows that it is possible to reverse declining historical trends in public transport ridership. This was achieved through an integrated fare scheme based on low-cost travel tickets and improvements in the quality of service. To adequately plan for future public transportation facilities requires reliable predictions of public transport demand that take into account the users response to changes in prices and the characteristics of the service.

Using recent monthly data, we addressed the problem of forecasting the demand for a large number of tickets that are subject to multiple, complex calendar effects, changing supply service, and changing seasonality, in addition to superimposition of outliers. Two different approaches were used to deal with these issues. The first one is a causal model based on a transfer function dynamic model that allows the incorporation of intervention and exogenous variables in a flexible way. The other is the Dynamic Harmonic Regression model, a new variant of unobserved component models with time-varying parameters, that allows the trend and the seasonal components to adapt as soon as the new information becomes available.

Both methodologies are capable of dealing with the nonstationarity and strong seasonality features that characterize the database. Additionally, all series exhibit the presence of large outliers related to several causesEaster and trading day effects, many strikes, and abrupt changes in public transport supply at the end of the samplewhich considerably complicate the forecasting exercise. The estimation results indicate that the effects of these input variables have the expected signs and are highly significant from a quantitative point of view. However, their effects change considerably for different types of tickets and transportation modes.

Several periods-ahead forecasts (1, 6, 12, and 24 months) for the 6 variables considered in this paper were obtained for 2000, 2001, and 2002. The forecasting for this period was particularly difficult given the major increase in Metro services just before this period. This meant a mixture of effects that influenced not only total demand but, also, temporary passenger shifts among different modes.

We tried to make this forecasting exercise as complete as possible by adopting a large variety of error measures. Forecast accuracy was assessed using four different aggregate measures as well as variants of the so-called model-free testing procedures. Recent developments have shown that these tests are relevant in a wide variety of circumstances (nonquadratic and asymmetric loss functions, serially correlated errors, and non-Gaussian distributions) where previous tests were not applicable. As our forecasting exercise illustrates, they have been very valuable in assessing statistical significance of the differences found between the individual modeling alternatives. We also looked at the accuracy gain derived from pooling the individual forecasts. Although the gain is not uniform for all variables and forecasting horizons, it is almost uniform for all variables in the case of one-step-ahead forecasts and tends to decrease as the forecasting horizon grows. For the most important variables in this dataset (TC and JTC), however, the pooling alternative seems to work well at all forecasting horizons. Finally, we have shown how careful attention is needed in distinguishing between in-sample and truly out-of-sample comparisons when assessing forecast accuracy.

### ACKNOWLEDGMENTS

The authors are very grateful to A. Martín Arroyo, three anonymous referees, and Guest Editors Keith Ord and Peg Young for very helpful comments and suggestions that have led to a much better paper. We are also grateful to the Consorcio de Transportes de Madrid for providing the data. This research was sponsored by Comunidad Autónoma de Madrid, contract grant 06-0170/2000 and the Spanish Ministery of Science and Technology, DGICYT contract grant BEC2002-00081.

### REFERENCES

Armstrong, J. and F. Collopy. 1992. Error Measures for Generalizing About Forecasting Methods: Empirical Comparisons. *International Journal of Forecasting* 8:6990.

Baumgartner, J.P. 1995. Beyond Transportation Forecasting. *International Journal of Transportation Economics* 22:314.

Box, G.E.P. and G.C. Tiao. 1975. Intervention Analysis with Applications to Economic and Environmental Problems. *Journal of the American Statistical Association* 70:7079.

Bujosa, M., A. García-Ferrer, and P.C. Young. 2002. An ARMA Representation of Unobserved Component Models Under Generalized Random Walk Specifications: New Algorithms and Examples. Mimeo.

Carbajo, J.C. 1988. The Economics of Travel Passes. *Journal of Transport Economics and Policy* 22:153173.

Diebold, F.X. 2003. *Elements of Forecasting.* Mason, OH: Thomson/South-Western.

Diebold, F.X. and R.S. Mariano. 1995. Comparing Predictive Accuracy. *Journal of Business and Economic Statistics *13:253263.

Fildes, R. 1992. The Evaluation of Extrapolative Forecasting Methods. *International Journal of Forecasting* 8:8198.

FitzRoy, F. and I. Smith. 1999. Season Tickets and the Demand for Public Transport. *Kyklos* 52:219238.

García-Ferrer, A., M. Bujosa, A. de Juan, and P. Poncela. 2003. Demand Forecast and Elasticities Estimation of Public Transport. Mimeo.

García-Ferrer, A. and R.A. Queralt. 1997. A Note on Forecasting International Tourism Demand in Spain. *International Journal of Forecasting* 13:539549.

Gillen, D.W. 1977. Alternative Policy Variables to Influence Urban Transport Demand. *Canadian Journal of Economics* 10:686695.

Harvey, A. 1989. *Forecasting Structural Time Series Models and the Kalman Filter,* 2nd edition. Cambridge, UK: Cambridge University Press.

Harvey, D.I., S.J. Leybourne, and P. Newbold. 1997. Testing the Equality of Prediction Mean Squared Errors. *International Journal of Forecasting* 13:281291.

Kitagawa, G. 1981. A Nonstationary Time Series Model and Its Fitting by a Recursive Algorithm. *Journal of Time Series Analysis* 2:103116.

Lave, C.A. 1994. A Behavioral Approach to Modal Split Forecasting. *The Economics of Transport,* vol. 1. Edited by H. Mohring. Aldershot, England: Elgar Publications. 215232.

Lythgee, W.F. and M. Wardman. 2002. Demand for Rail Travel to and from Airports. *Transportation* 29(2):125143.

Mariano, R.S. 2002. Testing Forecast Accuracy. *A Companion to Economic Forecasting.* Edited by M. Clements and D. Hendry. Oxford, England: Blackwell Publishers. 284297.

Matas, A. and J.L. Raymond. 2003. Demand Elasticity on Tolled Motorways. *Journal of Transportation and Statistics* 6(2/3):91108.

Newbold, P. and D.I. Harvey. 2002. Forecast Combination and Encompassing. *A Companion to Economic Forecasting.* Edited by M. Clements and D. Hendry. Oxford, England: Blackwell Publishers. 268283.

Pucher, J. and S. Kurth. 1996. Verkehrsverbund: The Success of Regional Public Transport in Germany, Austria and Switzerland. *Transport Policy* 2:279291.

Peña, D. 2001. Outliers, Influential Observations, and Missing Data. *A Course in Time Series Analysis.* Edited by D. Peña, G.C. Tiao, and R.S. Tsay. New York, NY: John Wiley. 136170.

Slavin, H. 1996. An Integrated Dynamic Approach to Travel Demand Forecasting. *Transportation* 23(3):313350.

West, M. and J. Harrison. 1997. *Bayesian Forecasting and Dynamic Linear Models,* 2nd edition. New York, NY: Springer-Verlag.

Young, P.C. and C. Ng. 1989. Variance Intervention. *Journal of Forecasting* 8:399416.

Young, P.C., D. Pedregal, and W. Tych. 1999. Dynamic Harmonic Regression. *Journal of Forecasting* 18:369394.

### APPENDIX A: BGF ALGORITHM

In the DHR model, *T _{t}* and

*S*consist of a number of DHR components, , with the general form

_{t},

where *p _{j}* and

*ω*= 2*

_{j}*π / p*are the period and the frequency associated with each

_{j}*j*-th DHR component respectively.

*T*is the zero frequency term , while the seasonal component is

_{t}where *j* = 1,...,*R* are the seasonal frequencies. Hence, the complete DHR model is

The oscillations of each DHR component, , are modulated by and , which are stochastic time-variable parameters that follow an AR(2) stochastic process with at least one unit root

therefore, nonstationarity is allowed in the various components.

This DHR model can be considered a straightforward extension of the classical harmonic regression model, in which the gain and phase of the harmonic components can vary as a result of estimated temporal changes in the parameters and .^{7}

The method for optimizing the hyper-parameters of the model (i.e., the variances

of the processes *ξ _{j}*,

*j*= 0...,

*R*, and the variance

*σ*of the irregular component) was formulated by Young et al. (1999) in the frequency domain and is based on expressions for the pseudospectrum of the full DHR model:

^{2}_{e}where are the pseudospectra of the DHR components , and is the variance of the irregular component. The optimization processes seek the vector **σ**^{2} that minimizes^{8}

,(19)

where *ƒ _{y}*(

*ω*) is the spectrum of the observed time series. The DHR components follow nonstationary ARMA processes; therefore, in order to find an ordinary least squares (OLS) solution for equation (19), the unit modulus AR roots of

*ƒ*(

_{dhr}*ω*,

**σ**^{2}) need to be eliminated. The DHR components are stochastic processes of the form

and the pseudospectrum of the complete DHR model is

If equation (19) is multiplied by the function

**Ψ**(*ω*) = **Φ**(*e ^{- i ω}*)

**Φ**

_{j}(

*e*),

^{i ω}where **Φ**(*B*) is the minimum order polynomial with all unit modulus AR roots of the complete DHR model, the algorithm minimizes

(20)

but, because in equation (20) all the unit modulus AR roots cancel, the minimization problem can be solved by OLS.

Finally, *ƒ _{y}*(

*ω*) can be substituted in equation (20) by the estimated AR spectrum

,

where *φ _{y}*(

*B*) is an AR polynomial fitted to the series, and is the residual variance of the fitted AR model. The size, shape, and location of the spectral peaks of are used to identify the models of each of the DHR components .

Once the models of the DHR components have been identified, and the hyper-parameters

have been optimized by OLS, the DHR components *T _{t}*,

*S*, and

_{t}*e*can be estimated using the Kalman Filter and Fixed Interval Smoothing.

_{t}### ADDRESS FOR CORRESPONDANCE AND END NOTES

**Authors' Addresses:**

Corresponding Author: Pilar Poncela, Departamento de Análisis Económico: Economía Cuantitativa, Universidad Autónoma de Madrid, 28049 Cantoblanco, Madrid, Spain. E-mail: pilar.poncela@uam.es

Marcos Bujosa, Departamento de Fundamentos del Análisis Económio II, Facultad de Ciencias Económicas, Universidad Complutense de Madrid, Campus de Somosaguas, 28223 Madrid, Spain. Email: marcos.bujosa@ccee.ucm.es

Antonio García-Ferer, Departamento de Análisis Económico: Economía Cuantitativa, Universidad Autónoma de Madrid, 28049 Cantoblanco, Madrid, Spain. E-mail: antonio.garcia@uam.es

Aránzazu de Juan, Departamento de Análisis Económico: Economía Cuantitativa, Universidad Autónoma de Madrid, 28049 Cantoblanco, Madrid, Spain. E-mail: aranzazu.dejuan@uam.es

KEYWORDS: Public transport forecasts, forecast accuracy, pooling forecasts.

^{1.} At the end of 2001, the Metro/bus share represented 72% of total passengers.

^{2.} This behavior can be explained by the fact that subway transfers are free while bus transfers are not, thus penalizing the user.

^{3.} The estimated coefficients represent the variations in log *y _{t}* due to the strike. For instance, the estimated percentage fall in SBT in February 1992 (estimated coefficient 1.6170) is computed as

*e*

^{1.6170} 1 = 0.8015.

^{4.} As a matter of fact, when Δ*PTC* is removed from the equation, the own-price elasticity for 10MT goes down to 1.10. A word of caution applies equally to the cross-effects results between 10MT and TC.

^{5.} As a matter of fact, *MRL _{i}* is never statistically significant in the single- and 10-ride equations. It only becomes significant for TC and JTC when the estimation period ends in December 1999 and later, due to the abrupt changes in

*MRL*during this last period.

_{i }^{6.} Among the variables associated with service changes, the Metro route length is primarily responsible for the demand changes experienced in the Madrid metropolitan area, particularly in 2000. This variable remains unchanged from 1988 to 1993 and changes very little (4%) from 1994 to 1997. However, it shows a huge increase (43%) during the last months of 1998 and throughout 1999, and zero growth during 2000 and 2001.

^{7.} The main difference between the DHR model and related techniques, such as Harvey's structural model (Harvey 1989), lies in the formulation of the UC model for the periodic components and the method of optimizing the hyper-parameters.

^{8.} Young et al. (1999) modified the problem using the residual variance from the fitted AR model as an estimation of , and then dividing by , so to seek the vector **NVR** = [ 1, *NVR*_{0}, ..., *NVR*_{R} ], where .