**Sandy Balkin
Keith Ord**

First, we would like to thank Johannes Ledolter, Andrew Harvey, and the team of Michael Fontaine, Tongbin Qu, Clifford Speigelman, and Karl Zimmerman (FQSZ) for their constructive and valuable comments. Without doubt, the issue at hand is a complex one, and the discussants have both carefully evaluated our study and suggested a number of directions for further research.

With the benefit of hindsight, we recognize that the discussants raised some commentary in response to some decisions and implicit assumptions not clearly stated in the paper. We begin with our choice of data. We agree that fatal crashes (fortunately) represent a small proportion of the accidents on our highways, but we chose to restrict attention to this set of statistics because the reporting of such events is more uniform. State-by-state requirements vary considerably for lesser incidents, and we hoped to use data that were reasonably comparable across states. However, we accept that "injury crashes" might have produced more stable results, at least within a particular state. A similar discussion is appropriate for the choice of interstates or primary roads. We felt that the classifications of urban and rural interstates were more uniform across the country than the definitions of primary roads. Nevertheless, a study for primary roads would also provide valuable insights, particularly on the question of differential responses by state.

Vehicle-miles traveled (VMT) was indeed a variable that we would have liked to use to produce an accident *rate* rather than a pure count. However, an assessment of the availability and quality of monthly VMT figures at the state level led us to the conclusion that such an analysis was not feasible for all states. Thus, we reluctantly decided to work with pure counts. An analysis of those states for which good data are available, building on the earlier work of Ledolter and Chan (1996), would certainly be worthwhile.

There were several comments concerning our definition of the intervention variables. First, we would like to make clear that we set the indicator as a step function; that is, it was scored as zero in the months prior to that state changing the speed limit and as one for the month when the change took place *in that state* and for all succeeding months. We recognize that there may well have been different levels of preparation and compliance to the new limits in the states, but we did not have such information available. The references provided by FQSZ are helpful in exploring this question further. The precise coding of the interventions is difficult. Even when all increases are the same numerically, as in 1987, the proportions of interstate mileage affected vary by state, as do questions of enforcement. For example, in Pennsylvania during the era of the 55-miles per hour (mph) speed limit conventional wisdom held that one would not normally be ticketed for speeding when traveling at 64 mph or less but that during the era of the 65-mph limit, the "threshold" was raised only to 69 mph. Whether or not such folklore is true, it clearly has an impact on driving behavior.

Ledolter mentions their earlier study, which took into account changes in average traffic speed. Again, we were not able to find reliable monthly data on this variable for all states and so did not include it in the analysis. However, the Ledolter and Chan (1994, 1996) results suggest a gradual shift over time, whereas we hypothesized a sudden impact on accident rates. Thus, even if the data were available, we would expect the effects to be distinct. Of course, the statistical analysis would be more efficient if the average speed were taken into account.

When speed limit increases vary, should the intervention be scaled across states to match the amount of the increase? We note that such a scaling does not affect the statistical analysis for an individual state, provided separate indicators are used for each increase. As Ledolter notes, a case could certainly be made for using a single scaled indicator to cover the two increases, but the benefits of consolidating the "speed effect" need to be set against some of the "rival events" mentioned by FQSZ.

We are grateful to Ledolter for summarizing the linkages between ARIMA and structural models. Hopefully, this description will make the paper and ensuing discussion accessible to a wider audience. We agree that similar results would be expected, whichever paradigm is adopted. Likewise, we concur that the seasonal patterns were usually quite stable; indeed, for a number of states the analysis did indicate fixed seasonals, but we did not report those details. Our reason for using structural models rather than the more widely used ARIMA framework was that we feel the direct specification of level, slope, and seasonal components is more intuitively appealing and allows the investigator to incorporate prior knowledge into the model selection process more readily. Granted, the ARIMA models can be decomposed into components, but this analysis is not provided in most software packages.

Harvey makes a number of valuable comments. The testing procedures developed by him and his co-authors over the years have brought the structural modeling approach to the point where it provides a completely viable alternative to ARIMA modeling. Indeed, as noted above, we feel that structural modeling is superior because of the intuitive understanding provided. As a theoretical aside, we note that the alternate approach to structural modeling developed by Ord, Koehler, and Snyder (1997) provides a system with the same parameter space as the ARIMA class, whereas the original system has a more restrictive parameter space. Thus, the methodological objections to using the structural modeling approach are gradually disappearing. We look forward to using these new developments in the next version of STAMP.

Harvey's comments about the use of a proper count-based model are well taken. We admit to using more accessible software in preference to the more correct but less computationally convenient count models. When the counts are small, this may lead to erroneous conclusions for a few of the smaller states, a point noted in our paper. Also, the idea of using control groups is an excellent one and would help to neutralize many of the "rival events" cited by FQSZ.

On the Super *t*-Test, we agree that the assumption of independence was not explored and that other approaches should also be considered. However, we feel that the general conclusions would remain valid.

FQSZ's comment that the study is "quasi-experimental" is perhaps too kind, and we would place it more on the "observational" end of the spectrum. The extent to which general conclusions can be drawn really rest on the precise definition of times in each state at which the interventions took place followed by a check for measurable effects at those times. In this sense, the study is quasi-experimental since it is highly unlikely that any of the rival events would match up with more than a few of the specified interventions.

The commentaries suggest a variety of additional factors to be taken into account, and we are reminded of the old story about the statistician and the economist who jointly examined the results of a regression analysis. The statistician asked, "Why did you use so many variables?" to which the economist replied, "Why did you use so few?"

Our objective was to account for the broad trends and seasonal patterns in the data and, that done, to identify the effects of the changes in speed limits. Without doubt, incorporating some of these factors would serve to improve models for individual states. Further, key variables such as VMT would have been valuable had they been uniformly available. On balance, we believe that the "keep it simple" approach was appropriate for an initial study and that one of the objectives was, indeed, to stimulate thinking about more sophisticated analyses in the future.

The point about learning curves, raised by FQSZ, is an interesting one. We began with the simple intervention variable described earlier and did not hypothesize learning effects. These appear to exist in some states but by no means in all. We debated whether to modify the intervention variables to describe this more complex behavior but eventually decided to stay with our original formulation in order to avoid any charges of "data dredging." Again, this is an important question that deserves a properly designed study.

FQSZ observes "It is clear that the effect of the speed-limit increase is specific to the individual states" and later that "The generalization ability of the authors' conclusion across states in this study is therefore uncertain." Our tabulated results show that the effects were not uniform across states; the penultimate sentence of our conclusions states that "Overall, increases were seen in *some* [italics added] states following speed limit changes." Our conclusions might have been more forcibly stated, but the results surely point to some increase for rural interstates, even though the effects vary by state. A key question for further research is why states had different levels of response and how this information might be used to improve safety.

We have already mentioned several possible directions for further research in conjunction with the comments made above. Of these, the most important ones would seem to be the study of differential responses by the states, the examination of learning effects, and the use of control groups. In addition, more focused studies that use more complete data from those states for which they are available would provide additional insights. Conversely, those states that do not provide key indicators such as VMT should give careful consideration to expanding their data collection efforts.

The other major point, made explicitly in some places and implicitly in others, is that the current analysis uses only aggregate data. The Fatality Analysis Reporting System (FARS) database provides detailed information on each fatal accident and could be used for micro-level studies to explore the impact of covariates such as those listed by FQSZ.

In conclusion, we would like to thank the commentators once again for their thoughtful and constructive suggestions, and we hope that our collective contributions will serve to advance understanding in this important area.

Ledolter, J. and K. Chan. 1994. *Safety Impact of the Increased 65-mph Speed Limit on Iowa Rural Interstates.* Final Report. Midwest Transportation Center, University of Iowa.

Ord, J.K., A.B. Koehler, and R.D. Snyder. 1997. Estimation and Prediction for a Class of Dynamic Nonlinear Statistical Models. *Journal of the American Statistical Association* 92:1621-29.