BTS employs a wide variety of statistical techniques in its work. However, regardless of the techniques used, there are some steps that should be included in any data analysis. This chapter provides general guidance on those steps, and then leaves the choice of analytical tools up to the data analyst performing the work.
This chapter contains standards for planning a data analysis (Section 5.1), calculating estimates and performing inferences (Section 5.2), and documenting the data analysis (Section 5.3). For quick-response projects, compliance with these standards is recommended, but not required.
Standard 5.1: Plan before starting a specific data analysis to ensure that the resulting product addresses the needs of BTS customers and that the resources are available to complete the data analysis.
Key Terms: key variable, target audience
The data analysis should be relevant, objective, comprehensive, and add value to existing information. To meet these goals, data analysts need to:
Prepare a data analysis plan in the proper format (BTS 2004) prior to the start of the data analysis.
Bureau of Transportation Statistics (BTS). 2004. BTS Information Product Scoping Paper. Washington, DC.
Approval Date: June 28, 2005
Standard 5.2: Estimates and statistical inferences made regarding the data must be based on acceptable statistical practice.
Key Terms: accuracy, bias, bridge estimates, estimates, inference, reliability, robustness, time series, trend, variance
Analyses must use theory and methods justifiable by reference to statistical literature (provided below in Related Information) or by mathematical derivation.
Statistical statements should be accompanied by some assessment of the limitations and uncertainty of the results.
Support statistical statements with proper testing and inference procedures.
If the scope of data collection changes or part of an historical series is revised, data for both the old and the new series should be published for a suitable overlap period.
State all statistical assumptions (such as assumptions about data distributions or structured dependence) made during the data analysis.
Agresti, A. 1990. Categorical Data Analysis. New York, NY: Wiley.
Anderson, T.W. 2003. An Introduction to Multivariate Statistical Analysis, 3rd ed. New York: Wiley.
Box, G.P., Jenkins, G.M., and Reinsel, G.C. 1994. Time Series Analysis: Forecasting and Control, 3rd ed. New York: Prentice Hall.
Casella, G. and Berger, R.L. 2001. Statistical Inference, 2nd ed. Belmont, CA: Duxbury Press.
Chatfield, C. 2003. The Analysis of Time Series: An Introduction, 6th ed. New York: Chapman and Hall.
Cleveland, W.S. 1993. Visualizing Data. Summit, NJ: Hobart Press.
Cochran, W.G. 1977. Sampling Techniques, 3rd ed. New York: Wiley.
Cook, R.D. and Weisberg, S. 1999. Applied Regression Including Computing and Graphics. New York: Wiley.
Cressie, N. 1991. Statistics for Spatial Data. New York: Wiley.
Daniel, C. and Wood, F.S. 1980. Fitting Equations to Data. New York: Wiley.
DeGroot, M.H. 1989. Probability and Statistics. Reading, MA: Addison-Wesley.
Diggle, P.J., Liang, K.-Y., and Zeger, S.L. 2000. Analysis of Longitudinal Data. Oxford: Oxford University Press.
Draper, N.R. and Smith, H. 1998. Applied Regression Analysis, 3rd ed. New York: Wiley.
Efron, B. and Tibshirani, R.J. 1994. An Introduction to the Bootstrap. New York: Chapman and Hall.
Fleiss, J.L. 1981. Statistical Methods for Rates and Proportions, 2nd ed. New York: Wiley.
Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.J., and Stahel, W.A. 2005. Robust Statistics: The Approach Based on Influence Functions, rev. ed. New York: Wiley.
Harvey, A.C. 1993. Time Series Models, 2nd ed. Cambridge, MA: MIT Press.
Hicks, C.R., and Turner, K.V. 1999. Fundamental Concepts in the Design of Experiments. Oxford, UK: Oxford University Press.
Hogg, R.V., Craig, A., and McKean, J.W. 2004. Introduction to Mathematical Statistics, 6th ed. New York: Prentice Hall.
Hosmer, D.W., and Lemeshow, S. 1989. Applied Logistic Regression. New York: Wiley.
Huber, P.J. 1981. Robust Statistics. New York: Wiley.
Kelsey, J.L., Whittemore, A.S., Evans, A.S., and Thompson, W.D. 1996. Methods in Observational Epidemiology. New York: Oxford University Press.
Kleinbaum, D.G., Kupper, L.L., and Muller, K.E. 1988. Applied Regression Analysis and Other Multivariable Methods. Boston: PWS-Kent.
Lehmann, E.L. and Romano, J.P. 2005. Testing Statistical Hypotheses, 3rd ed. New York: Springer Verlag.
Lehmann, E.L. and Casella, G. 1998. Theory of Point Estimation, 2nd ed. New York: Springer Verlag.
Little, R.J.A. and Rubin, D. 1987. Statistical Analysis with Missing Data. New York: Wiley.
McCulloch, C.E. and Searle, S.R. 2001. Generalized, Linear, and Mixed Models. New York: Wiley.
Mood, A.M., Graybill, F.A., and Boes, D.C. 1974. Introduction to the Theory of Statistics. New York: McGraw-Hill.
Office of Management and Budget (OMB). 2005. Standards for Statistical Surveys (Proposed), Sections 4.1 (Developing Estimates and Projections) and 5.2 (Inference and Comparisons). Washington, DC. July 14.
Pankratz, A. 1983. Forecasting with Univariate Box-Jenkins Models. New York: Wiley.
Rao, C.R. 1973. Linear Statistical Inference and Its Applications, 2nd ed. New York: Wiley.
Rohatgi, V.K. 1976. An Introduction to Probability Theory and Mathematical Statistics. New York: Wiley.
__________. 1984. Statistical Inference. New York: Wiley.
Rousseeuw, P.J., and Leroy, A.M. 1987. Robust Regression and Outlier Detection. New York: Wiley.
Srndal, C.-E., Swensson, B., and Wretman, J. 1991. Model Assisted Survey Sampling. New York: Springer Verlag.
Scheff, H. 1959. Analysis of Variance. New York: Wiley.
Searle, S.R., Casella, G., and McCulloch, C.E. 1992. Variance Components. New York: Wiley.
Seber, G.A.F., and Lee, A.J. 2003. Linear Regression Analysis, 2nd ed. New York: Wiley.
Selvin, S. 1996. Statistical Analysis of Epidemiologic Data. Oxford, UK: Oxford University Press.
Skinner, C., Holt, D., and Smith, T. 1989. Analysis of Complex Surveys. New York: Wiley.
Snedecor, G.W. and Cochran, W.G. 1989. Statistical Methods, 8th ed. Ames, IA: Iowa State University Press.
Tukey, J. 1977. Exploratory Data Analysis. Reading, MA: Addison-Wesley.
U.S. Department of Transportation. 2002. The Department of Transportation Information Dissemination Quality Guidelines, Appendix A, Sections 4.3 (Production of Estimates and Projections) and 4.4 (Data Analysis and Interpretation). Available at http://dms.dot.gov/ombfinal092502.pdf as of January 19, 2005.
Wolter, K.M. 1985. Introduction to Variance Estimation. New York: Springer Verlag.
Zacks, S. 1971. Theory of Statistical Inference. New York: Wiley.
Approval Date: June 28, 2005
Standard 5.3: Document the methods and models used in data analysis products to help ensure objectivity, utility, transparency, and reproducibility of the estimates and projections.
Key Terms: reproducibility, transparency
The data analysis report must contain details of the methods used during the data analysis, including a description of software used, a discussion of the data analysis assumptions, and key information relevant to obtaining the data analysis results.
Bureau of Transportation Statistics (BTS). 2005. BTS Statistical Standards Manual, Section 6.8 (Public Documentation), Washington, DC. Available at http://www.bts.gov/programs/statistical_policy_and_research/bts_statistical_standards_manual/index.html, as of June 10, 2005.
Office of Management and Budget (OMB). 2002. Guidelines for Ensuring and Maximizing the Quality, Objectivity, Utility, and Integrity of Information Disseminated by Federal Agencies. Federal Register, Vol. 67, No. 36, pp. 8452-8460. Washington, DC. February 22.
__________. 2005. Standards for Statistical Surveys (Proposed), Section 4.1 (Developing Estimates and Projections). Washington, DC. July 14.