Estimating the Impact of Recent Interventions on Transportation Indicators
Estimating the Impact of Recent Interventions on Transportation Indicators
Whenever an unusual event disrupts the structural patterns of a time series, one of the aims of a forecaster is to model the effects of that event, with a view to establishing a new basis for forecasting. Intervention analysis has long been the method of choice for such adjustments, but it is often represented as a procedure for dealing with events in the middle of the time series rather than for the most recent observations. In this paper, we develop a method, termed the three-intervention approach, to provide a flexible solution to this problem. We examine its application for a number of transportation series that were disrupted by the tragic events of September 2001. Analyses of the series using up to six months of post-event data show good agreement with results based on longer post-event series, and suggest that the proposed method will often provide adequate modifications to a series in a timely manner. The method is applicable to most economic time series, but has been tested only for transportation series.
The time that elapses between the occurrence of events and the production of the statistical records that describe them will always be too long. The production speed of monthly series will continue to improve in the area of transportation as elsewhere, but time lags in production are inevitable. Thus, when new data become available, it is important to remove any distortions caused by recent events so that we can both understand what has happened and predict future developments. This quality will be jeopardized whenever the latest values do not provide a clear indication of the true situation. Common examples include the disruption of travel patterns because of extreme weather conditions or a loss of service due to a labor dispute. We refer to such effects as interventions in the time series, which may return to its previous level rapidly, slowly, or not at all. In such circumstances, the data may be misleading and require adjustment before underlying trends can be discerned.
When an intervention occurs some months in the past, data are available on either side of the affected month(s), and we may use the more conventional methods of intervention analysis to make adjustments (see, e.g., DeLurgio 1998, chapter 12 or Harvey 1989, section 7.6). In these circumstances, the nature of the intervention can usually be identified and it remains only to estimate the model parameters.
A rather different problem arises when the intervention has just taken place. The same general methods are appropriate but the amount of data available to describe the event is necessarily very limited. Further, the nature of the change (e.g., permanent or temporary) may be uncertain. Nevertheless, we wish to ascertain the nature of the change and to estimate its impact as quickly as possible so that series predictability may be restored. This paper develops such an early response system and tests its performance empirically. In the next section we describe the basic ideas, and the following section describes their implementation in a structural modeling framework. After that we present the analysis of a single series to illustrate ideas. The general approach and results for a number of transportation series are then summarized and discussed. The paper concludes with final comments on the proposed form of intervention analysis.
AN INTERVENTION ANALYSIS FRAMEWORK
The basic ideas for monitoring a process over time are central to statistical process control (SPC). The use of time-series modeling in SPC follows from the seminal work of Alwan and Roberts (1988, 1995). In SPC, we conventionally distinguish two sources of variation (see Alwan 2000, pp. 217220):
- Common cause variation reflects the natural variation inherent in the process, and
- Special (or assignable) cause variation is any variation in the process introduced by a recognizable factor (e.g., a worn tool or a poorly trained operator).
In the present context, we are interested in identifying recent changes, and the possible types of assignable cause need to be identified more clearly. Thus, it is useful to divide assignable cause variation into three categories, which we may examine by different means:
- Additive outlier (AO)A factor has a short-term temporary impact on the series, which is resolved within a single observational period. The series then returns to its original state. For example, the effects of a blizzard on airplane traffic would typically be of this nature.
- Temporary change (TC)A factor has a relatively short-term impact on the series, which returns to its previous state over a number of time periods. For example, a prolonged strike in an industry will reduce production, which gradually recovers over the next several months.
- Level shift (LS)A factor causes the series to shift to a new level, and the series stays at that new level. For example, a change in the law on seat belts will lead to a shift in the number of fatalities.
The emphasis in these three types of intervention is on a sudden change in the series, and such changes are the basis of our present study. By contrast, it is possible to observe slowly changing conditions that may lead to fundamental changes in the series of interest. For example, improved engine design might produce greater fuel efficiency ratings for automobiles, but such an effect would be seen only very gradually in an aggregated series on average miles per gallon. Such changes will be incorporated into the trend terms of our models and so are not identified directly. Finally, we note that the seasonal pattern in the series may vary over time. For example, airlines may alter their seasonal pricing strategies, which would lead to a shift in travel patterns. Both these effects are important for longer term predictability, but they are less critical in the shorter term.
We make the three assignable causes operational in accordance with the definitions in table 1, where we assume that the event takes place at time T, and Xt is an indicator (or dummy) variable that indicates the timing of the event. Thus, in the simplest case of a series that has a constant mean, , over time with random disturbances ε, except for interventions, the series yt would be modeled as:
yt = μ + Xt I + εt (1)
where I measures the magnitude of the intervention.
Figures 1, 2, and 3 provide graphical examples of the AO, TC, and LS, respectively. A major question with the TC is the rate of adjustment. Indeed, the AO may be viewed as a TC with adjustment factor, d = 0, or sufficiently small to disappear within one time period. Likewise, an LS may be viewed as a TC with d approaching 1.0. Keep in mind that we will typically have a very limited amount of data with which to estimate these effects, and that the direct estimation of adjustment rates is difficult even when we have a considerable number of observations after the intervention (see Box and Tiao 1975). Although many interventions are unique in nature, the notion of the time taken to recover from the effect is usually quite well understood by those in the industry. Thus, subject matter specialists can sometimes provide reasonable estimates of the half-life for a new intervention, even though the magnitude of the effect cannot be reliably assessed in advance.
Chen and Liu (1993) recommend d = 0.7 as a convenient choice, which results in a half-life of the TC of about three months. That is, in months 1, 2, and 3, the weights assigned to the TC are 1.0, 0.7, and 0.49, reducing to about half the starting weight. Likewise, a value of d = 0.8 corresponds to a half-life of about four months and d = 0.9 to a half-life of just over seven months. A value of d greater than 0.9 becomes almost indistinguishable from a level shift in the short term. Conversely, a value appreciably less than 0.7 may be represented by a one- or (at most) two-period AO.
Based both on the argument of Chen and Liu (1993) and observations about likely recovery times from transportation analysts, we used d = 0.7 in our empirical study of monthly series. As a general matter, an investigator should pay attention to both the phenomenon under study and the frequency with which the data are recorded.
THE STRUCTURAL TIME-SERIES MODEL
Although terms such as "trend" and "seasonal" are intuitively appealing, they are mental constructs because we cannot observe them directly. Therefore, we use a structural modeling approach that treats them as unobserved components (Harvey 1989; Harvey and Shephard 1993). In the empirical work, we used the STAMP (Structural Time Series Analyser, Modeller, and Predictor) software in conjunction with GiveWin (for details, see Koopman et al. 2000).
The trend is the long-run component in the series; it designates the general direction in which the series is moving. The trend consists of two parts: the level (which is the current value of the trend) and the slope (which represents the change in the level from one period to the next). Both the level and the slope may be either fixed or evolve over time. A slope may or may not be present depending on the nature of the phenomenon being studied. The seasonal component represents variations over the year, such as increased traffic during the summer. Again, a seasonal component may or may not be present and, if present, may be fixed or evolve over time. The irregular component represents the unexplained variation in the series. We define the components at time t as follows: level = μt ; slope = βt ; seasonal component = γt ; and irregular component = εt . We assume that the process is observed at unit time intervals (t, t+1,...) and that there are s such intervals in a year (e.g., s = 12 for monthly data). We then allow each component to evolve over time according to the specifications:
μt = μt - 1 + βt - 1 + ηt (2)
βt = βt - 1 + ξt (3)
γt = − γt - 1 − γt - 2 − ... − γt - s + 1 + ωt (4)
Equation (4) provides the dummy variable form of the seasonal component (the reader is directed to Koopman et al. (2000) for the trigonometric formulation of the seasonal component).
The quantities ηt , ξt , and ωt represent zero mean, random shifts in the corresponding component. We assume such shifts to be independent of one another and uncorrelated over time; we also assume that they are independent of the "irregular" component, εt , seen in equation (5) below. Equations (2) through (4) are known as the state or transition equations, because they describe the underlying states of the process or the transition of the components from one time period to the next.
Equations (2) and (3) together provide a general framework for describing the evolution of the trend. If the process being modeled does not require all of these components, they can be dropped from the specification. The components are tested in sequential fashion as follows (Harvey 1989, pp. 248256):
- Does the slope disturbance term have positive variance? (Zero variance corresponds to the slope being fixed over time.)
- If the slope disturbance term has a zero variance, does the slope parameter estimate significantly differ from zero? (An insignificant slope coefficient having a slope disturbance term with a zero variance indicates that the slope term should be dropped from the model.)
- Does the level disturbance have positive variance? (Zero variance corresponds to the mean level being fixed over time.)
If all three statistical tests produced negative outcomes, the overall trend term would be reduced to a constant.
When the time series is seasonal, we check the following:
- Does the seasonal disturbance term have positive variance? (Zero variance corresponds to a stable seasonal pattern.)
- If the seasonal disturbance term has a zero variance, are the seasonal components significantly different from zero? (Is there any seasonal pattern? Should seasonality be dropped from the model?)
If the seasonal disturbance term has zero variance but the seasonal components are significantly different from zero, we are left with a "classical" model with fixed seasonal components. If the seasonal pattern is rejected completely, we reduce the model purely to its trend components.
The observed series is related to the state of the system by the observation (or measurement) equation:
yt = μt + γt + εt (5)
where εt denotes the irregular component. The irregular component has zero mean and is assumed to be serially uncorrelated (i.e., not predictable) and independent of the disturbances in the state equations.
Estimation proceeds by maximum likelihood (Harvey 1989, pp. 125128). Operational details are provided in Koopman et al. (2000, section 8.3). The key parameters are the four variances corresponding to the disturbance terms [ σ2ε , σ2η , σ2ξ , and σ2ω ]. Note that we assume these variances are constant over time; the time series may need to be transformed to justify this assumption, at least to a reasonable degree of approximation. The four variance terms control the form of the model, allowing each component of level, slope, and seasonality to be stochastic or fixed; slope and seasonal elements may be present or absent. Table 2 illustrates the principal variations. If fixed components are included in a model, the corresponding terms appear in the state equations (e.g., fixed seasonal coefficients), but the variance term is zero. If the components are stochastic, the same terms appear in the model, but the variance is strictly positive. The most general form is the Basic Structural Model (BSM), in which all components are stochastic. The BSM forms the starting point for the model development process and is the standard form employed in STAMP. The program then "tests down" to eliminate any components that are not required.
ANALYSIS OF LATE ARRIVAL OF SCHEDULED FLIGHTS
By way of illustration, we consider an example that has received considerable publicity in recent years, namely late arrival of flights, or airline delays. The particular series examined in this section describes the monthly percentage of scheduled flights for major U.S. air carriers not arriving on time, or the Late Arrivals time series. A plot of this series, for the period September 1987February 2002, is shown in figure 4. The tragic events of September 2001 changed many lives in fundamental ways and also had a serious effect on the level of activity in the airline industry. Therefore, we will analyze the series initially only up to August 2001 and consider the aftermath of the terrorist attacks in the next section.
An initial set of interventions (prior to September 2001) was identified using information provided by the Bureau of Transportation Statistics (published in a report known as Transportation Indicators), combined with an initial analysis using the AUTOBOX software (produced by Automatic Forecasting Systems1).
Two significant pulses, or AOs, were found for this series within the time period of September 1987 through August 2001: January 1996 and December 2000. These two interventions were weather-related and were incorporated into the STAMP modeling process. Our analysis of the series, using STAMP, revealed that the most appropriate model included a stochastic level, no slope, and fixed seasonal components. This model yields the outputs shown in figures 5, 6, 7, and 8. Figure 5 shows the smoothed trend, the seasonal components and the irregular component. The smoothed versions use the entire series to construct the trend, seasonal, and irregular components; it is a better choice for gaining a perspective on the evolution of the series, because the estimates use observations both before and after the time period in question.
When these plots are compared with the filtered (or forecast) components in figure 6, the increased roughness of the trends in the latter set become evident. The seasonal pattern in the filtered series also changes over time, because the filtered components use only the observations up to the current time in each set of calculations. Thus, the filtered components are directly useful for prediction purposes, and we use only these components in subsequent analyses.
As noted earlier, our initial analysis was based on the data from September 1987 through August 2001. Now that the model has been specified, the holdout sample of data from September 2001 through February 2002 is placed back into the dataset, and same model is fitted onto the full set of data. Figure 7 shows the standardized residuals for the full fitted series and highlights the impact of post-September 11, 2001. The horizontal lines correspond to ± 2 standard deviations, and the plot may be thought of as a Shewhart chart.2 As expected, the chart indicates a sharp rise in the percentage of late arrivals in September, followed by a major decline in late arrivals in October 2001, due primarily to reduced traffic levels.
The automated analysis in STAMP suggests an outlier in September 2001 and a level shift in October 2001. Using these interventions, the final fitted trend for the full set of data is shown in figure 8. Although the overall performance appears to be satisfactory, the further declines in the trend after October 2001 seem inconsistent with the model and suggest the need for further analysis. This problem is reflected more clearly in later analyses (seen in figures 13 and 14). Since data for such a modeling exercise are necessarily very limited, we need to use our judgment on likely future developments, as illustrated in the next section when we use the framework developed earlier.
GENERAL APPROACH TO INTERVENTION MODELING
We applied the framework developed earlier within the following context. After September 11, 2001, the airlines experienced massive disruptions in their schedules. In October and later months, the overall operating system gradually returned to normal, but passenger traffic resumed at lower levels than prior to the attack. This sequence of events may be represented by the following set of interventions:
- A purely transient effect (AO) relating to the month of September only.
- A temporary change or shift (TC) that started in October 2001 and gradually disappeared. We could have started this effect in September, but felt that October provided a simpler interpretation. As noted earlier, we used d = 0.7 in all cases.
- A permanent effect (LS) that changed all mean values of the series from November 2001 on. Again, note that we could have started this factor in September or October, but we felt that the present construction affords a simpler interpretation by separating out the start dates of the three interventions. Provided all three interventions are retained in the model, the particular choice of starting dates will not affect the fitted or forecast values in the series.
We considered the following five series, primarily selected from the air transportation sector, since this was the mode of transportation most affected:
- Late arrivalspercentage of scheduled flights by major U.S carriers not arriving on time.
- Cancellationspercentage of scheduled flights by major U.S. carriers that were canceled.
- Domestic enplanementsnumber of passengers boarding domestic aircraft (millions).
- Air revenue passenger-milesrevenue-earning miles flown by passengers on major U.S. carriers (billions).
- Rail revenue passenger-milesrevenue-earning passenger-miles carried by Amtrak and the Alaska Railroad (millions).
The late arrivals series was illustrated in figure 4; figures 9, 10, 11, and 12 provide graphs of the other series being examined. In all cases, the data are available on the U.S. Department of Transportation website (http://www.dot.gov). The first four series are collected by the Office of Airline Information in the Bureau of Transportation Statistics (also available at http://www.bts.gov), and the fifth is produced by the Federal Railroad Administration.
In our original analysis of these data (Young and Ord 2002), we were able to use only a small number of observations post-intervention (table 3). In this paper, we report those initial analyses recomputed using the data as later revised by the agencies. It should be noted that the different time periods used in the analyses reflect data availability at the times the analyses were completed (May 2002 and January 2003). These inevitable delays serve to underscore the importance of timely and reliable adjustments to series after interventions.
Our procedure was as follows:
- Develop a model for the series up to August 2001, incorporating AO and LS outliers where needed (as for late arrivals in our earlier analysis).
- Using the data available as of May 2002 (table 3), run the same model with AO, LS, and TC components as specified above and "test down" to eliminate insignificant coefficients. This analysis was performed initially in early June 2002 (Young and Ord 2002) and minor differences in the results are reported only to the extent that changes occurred in the reported series after that time.
- Using the data available as of January 2003 (table 3), run the same model with AO, LS, and TC components as specified above and test down to eliminate insignificant coefficients.
- Use the models developed in steps 2 and 3 to generate successive one-step-ahead forecasts for the most recent data to see if the earlier analysis (step 2) provided an adequate description of the structural changes in the series.
The original models are summarized in table 4. The data revisions noted above did not lead to any changes in specification; the changes in the estimated coefficients were minor in all cases. The entries in table 4 are to be interpreted as in the following example for late arrivals:
State equations (stochastic level, no slope, fixed seasonals):
μt = μt - 1 + ηt (2a)
γt + γt - 1 + ... + γt - s + 1 = 0, or γs fixed (4a)
γt = μt + γt + X1I1 + X2I2 + εt (5a)
where s denotes the number of seasons, γj denotes the parameter for the fixed seasonal effect in period j, and (X1, X2) denote the AO interventions at January 1996 and December 2000. Numerical details are omitted in the interests of space.
Because we had only between four and six observations in the initial study (from September to December or February depending on the series), it was only to be expected that the LS and TC estimates would be highly correlated, and the TC dropped out in three of the five series. What is remarkable is that when the analyses were re-run with the later observations included, the changes were minor in all cases. The results are given in table 5.
Several conclusions may be drawn from table 5:
- In all cases, large adverse effects were identified in September, as expected. Rail traffic was reduced as well as air traffic, because people were reluctant to travel at all.
- In all cases, the estimates based on the first analysis seem to provide adequate adjustments to the series.
- Cancellations and late arrivals showed negative level shifts reflecting the reduced amount of traffic in subsequent months. As airports gradually resumed normal operations, we might have expected these series to have resumed their earlier levels, but the initial estimates seem to have provided a reasonable assessment of the reactions. These effects are probably the result of less congestion as the result of lower traffic volumes.
- The temporary effects for enplanements and air revenue passenger-miles are about five times the size of the permanent level shift. All these effects were negative, indicating the adverse effect on the airline industry. Both the larger temporary effect and the smaller final impact seem to have been adequately recognized in the first analysis.
- The rail revenue passenger-miles series shows no change after the first month, which is consistent with the data and reflects the relative independence of the two markets.
- Although the details are not reproduced here, the diagnostics for each series indicated that the descriptions were consistent with the data available. Since the same set of three interventions was applied to each series, this provides some evidence that the descriptions are reasonable, although further data are clearly needed to validate that claim.
The results shown in table 5 indicate that the parameter estimates of the three-intervention terms based on limited data proved to be comparable with those estimates with more data points after the intervention. But does this model fit also imply comparable results when forecasting?
In order to study this issue, we chose to compare the quality of forecasts with the three-interventions (AO, TC, and LS) and without the three-interventions in the model. Figures 13 and 14 illustrate graphically the forecasts of air revenue passenger-miles based on the models without (figure 13) and with (figure 14) the three-intervention coefficients (forecast based on the model fitted on data ending December 2001).
If no interventions are incorporated in the model, the forecast for air revenue passenger-miles would be a continuing downward trend, whereas the incorporation of interventions identifies the initial downturn and the subsequent gradual though partial recovery. This pattern would, of course, eventually be identified without the intervention analysis. However, the modified trend is identified much more quickly and reliably when the three-intervention model is applied.
The "no-interventions" forecasts will provide a basis of comparison for the three-interventions forecasts with differing forecast origins (December 2001, March 2002, and June 2002). To measure the forecast accuracy of each model, we chose to calculate the Mean Absolute Percentage Error (MAPE) values
et is the forecast error for time t, yt is the observed value at time t, and n is the number of forecasts. The MAPE values were calculated for forecasts based on the two types of models (with and without interventions) from the three forecast origins (December 2001, March 2002, and June 2002). The summations cover the period from the forecast origin to the latest observation available (table 6). For each forecast origin, the MAPE values for the two types of models are then compared by creating a "Relative Value" (or RV) of those two MAPES:
RV = MAPEno interventions / MAPEthree-interventions (6)
The results of the RV calculations for the different forecast origins are shown in table 6.
We need to refer back to table 4 in order to understand the results in table 6. Late arrivals and rail revenue passenger-miles have fixed seasonal patterns, so that the model estimates for the seasonal components of these series are not affected by the intervention. The other three series have stochastic seasonals and the three-interventions model identifies the disruptions in their seasonal patterns more accurately than the no-interventions model, especially for the cancellations series (figure 9).
All five series have stochastic levels so that the intervention is gradually incorporated into the model structure over a period of months. Consequently, for the RVs for the forecast horizon with the least number of time periods after September 2001 (December 2001), the intervention approach provided more accurate forecasts for each of the five series. However, as more data values are obtained, those series that contain local level and fixed seasonal components seem to have corrected quickly and therefore do not show any longer term improvement through the intervention approach. Such effects have been noted in other forecasting studies (e.g., Makridakis and Hibon 2000) and reflect the adaptive nature of the models used.
The structural modeling approach does allow the model to adapt itself to changes in the data, but the incorporation of the interventions allows the model to react more quickly. For our five sets of data, the series seem to be tracked reasonably by the three-interventions model after three to four months, whereas it takes about six months or longer (especially for cancellations) for the no-interventions model to self-adjust. The importance of the proposed procedure lies in the ability to reduce the time required to discern the underlying trends after the intervention has occurred.
A key requirement in forecasting is that adjustments should be made for unusual events so that the series can be forecast on its new trajectory. In addition, we also seek to make changes that are reliable, yet quick to take effect. These requirements are especially important when the series has been subjected to a major intervention and we wish to identify the newly emerging trend.
The results of this study suggest that the flexible use of the three-interventions approach we have described provides adequate adjustments by three to four time periods after the event. This contrasts with a six-month or longer delay in self-adjustment even for a flexible model, and probably a longer time if a model-based procedure is not used at all. In addition, the three-interventions method enables us to provide an initial partitioning of the effects into short-term, transient, and permanent shifts, which is important for planning purposes. Furthermore, the structural technique employed to calculate these models can be easily updated as new data become available, so that the previous month's assessment of the shifts can be compared with the latest results and an assessment made of how quickly the system is returning to a new stable level after the intervention. We are cautiously optimistic that the proposed approach offers a way forward in dealing more expeditiously with interventions at the end of a time series.
Alwan, L.C. 2000. Statistical Process Analysis. Boston, MA: Irwin/McGraw-Hill.
Alwan, L.C. and H.V. Roberts. 1988. Time-Series Modeling for Statistical Process Control. Journal of Business and Economic Statistics 6(1):8395.
______. 1995. The Problem of Misplaced Control Limits. Applied Statistics 44(3):269278.
Box, G.E.P. and G.C. Tiao. 1975. Intervention Analysis with Applications to Economic and Environmental Problems. Journal of the American Statistical Association 70:7079.
Chen, C. and L.-M. Liu. 1993. Joint Estimation of Model Parameters and Outlier Effects in Time Series. Journal of the American Statistical Association 88:284297.
DeLurgio, S.A. 1998. Forecasting Principles and Applications. Boston, MA: Irwin/McGraw-Hill.
Harvey, A.C. 1989. Forecasting, Structural Time Series Models, and the Kalman Filter. Cambridge, UK: Cambridge University Press.
Harvey, A.C. and N. Shephard. 1993. Structural Time Series Models. Handbook of Statistics, Volume 11. Edited by G.S. Maddala, C.R. Rao, and H.D. Vinod. Amsterdam, Netherlands: Elsevier Science.
Koopman, S.J., A.C. Harvey, J.A. Doornik, and N. Shephard. 2000. Stamp: Structural Time Series Analyser, Modeller, and Predictor. London, UK: Timberlake Consultants.
Makridakis, S. and M. Hibon. 2000. The M3 Competition. International Journal of Forecasting 16(4):451476.
Young, P. and K. Ord. 2002. Monitoring Transportation Indicators, and an Analysis of the Effects of September 11, 2001, paper presented at the 22nd International Symposium on Forecasting, Dublin, Ireland.
ADDRESS FOR CORRESPONDANCE AND END NOTES
Corresponding Author: Keith Ord, The McDonough School of Business, Georgetown University, 37th and "O" Streets, NW, Washington, DC 20057. E-mail: email@example.com
Peg Young, Bureau of Transportation Statistics, U.S. Department of Transportation, 400 Seventh Street, SW, Room 3430, Washington, DC 20590. E-mail: firstname.lastname@example.org
KEYWORDS: Airline traffic, forecasting, intervention analysis, process control, structural time series, transportation indicators.
1. Information on AUTOBOX software can be found at http://www.autobox.com.
2.Shewhart charts are widely used in statistical process control to identify out-of-control conditions from which we would seek to identify assignable causes. The center line in figure 7 (equal to zero here because we are plotting regression residuals) and the vertical axis represent the number of standard deviations (SD) that an observation lies above or below the mean. In conventional use, a single observation that is more than three SD from the mean, or two successive observations more than two SD from the mean, is said to signal an out-of-control condition, in contrast to a state of statistical control, i.e., a stable system of random variation.