You are here

Chapter 2 Planning and Design of Data Collection Systems

Chapter 2
Planning and Design of Data Collection Systems

BTS data collection systems must be designed to meet both internal and external user needs and the agencys legislative mandates.

This chapter covers the planning and design of data collection systems, including:

  • Establishing data needs and data collection system objectives (Section 2.1),
  • Identifying the data providers (Section 2.2),
  • Planning and designing data collection methods to meet data needs and objectives (Section 2.3), and
  • Documenting data collection plans and designs (Section 2.4).

2.1 Objectives and Requirements

Standard 2.1: Planning for a data collection system, whether it is a new system or a revision of an established system, must include:

  • Consultation with data users and providers,
  • Definition of data needs and objectives, and
  • Choice of how to meet data requirements.

Key Terms: major data users, precision

Guideline 2.1.1: Consultation with Data Users and Providers

Develop and update the data system objectives in partnership with major data users and data providers. Establish a process to consult regularly with major data users regarding changes in data needs and possible updates to the data collection system.

  • OMB requires publication of a Federal Register notice requesting public comments for all proposed information collections, administered by a federal agency, that would collect data from ten or more persons outside the federal government within a year,
  • Consultations with data users and providers should be expanded to include other means for collecting comments and suggestions, such as individual meetings, focus groups, presentations at conferences and workshops, cognitive testing, and pretests/pilot tests.
  • When revising an established data collection system, review any previous evaluation studies for information relating user needs to current system performance.

Guideline 2.1.2: Definition of Data Needs and Objectives

Establish system objectives in clear, specific terms that identify data user needs and data analysis goals before initiating data system development. Modifications required later are often difficult and expensive to implement. The definition of data needs should include:

  • What data items are needed and how they will be used,
  • The precision level required for estimates,
  • The format, level of detail, and types of tabulations and outputs, and
  • When and how frequently users need the data.

The final data collection choices will be made in the design phase (Section 2.3), taking into account constraining factors (e.g., cost, time, legal factors), and quality of available data.

Guideline 2.1.3: Choice of How to Meet Data Requirements

Before beginning detailed planning for the collection of specific data items, review related studies and data collection systems. Determine whether all or part of the required data are already available, or could be more easily obtained by adding or modifying questions in existing federal data collections.

  • If the required information is not directly available, determine whether it can be derived or estimated using existing data sources.
  • If existing federal data collection systems meet some but not all of the data requirements, determine whether the existing data systems can be altered to meet the data requirements through, for example, an inter-agency agreement.

Related Information

Office of Management and Budget (OMB). 2005. Standards for Statistical Surveys (Proposed), Section 1.1 (Survey Planning). Washington, DC. July 14.

Stopher, P. and Jones, P., eds. 2003. Transport Survey Quality and Innovation. Oxford, UK: Pergamon.

U.S. Department of Transportation (DOT). 2002. The Department of Transportation Information Dissemination Quality Guidelines, Chapter 2 (Planning Data Systems). Washington, DC. Available at http://dms.dot.gov/ombfinal092502.pdf as of January 19, 2005.

Approval Date: August 15, 2005

2.2 Target Population and Sample Design

Standard 2.2: Planning and design must specify the proposed target population, source for lists of the target population, and (where applicable) sample design and sample size, accuracy requirements, and response rate goals.

Key Terms: accuracy, coverage, frame, response rate, target population

Guideline 2.2.1: Target Population and Frames

Lists of the units in target population are required to obtain information from the target population. Availability of such lists (also known as frames) is often a restriction to the method used in data collection. When a new frame is needed for a data program, develop and implement a plan for constructing the frame. The plan should cover:

  • Choice of the target population and the rationale,
  • Any exclusions that have been applied to target and/or frame populations by design,
  • Sources of lists of target population units,
  • Identification and description of other frame files which exist and whether portions of other frame files will be used to construct a new file,
  • When applicable, a description of any multistage sampling, such as geographic area sampling, that will be undertaken prior to development of lists of units and the stages in which the final lists will be developed,
  • Methods for matching and merging population lists, if applicable,
  • Data items needed for units in the frame,
  • Anticipated coverage of the target population by the frame,
    • Coverage rates in excess of 95 percent overall and for each major target population subgroup are desirable.
    • Consider using frame enhancements, such as frame supplementation or dual frame estimation, to increase coverage.
    • If the anticipated coverage falls below 85 percent, evaluate and document the potential for bias (OMB 2005).
  • Any estimation techniques used to improve the coverage of estimates, such as post-stratification procedures,
  • Other limitations of the frame including the timeliness of the frame, and
  • Projected frequency of frame updates.

Guideline 2.2.2: Sample Design

A 100 percent data collection may be required by law, necessitated by accuracy requirements, or relatively inexpensive (e.g., data readily available). Otherwise, the sample design should include appropriate sampling methods. Any sample design chosen should ensure the sample will yield the data required to meet the objectives of the data collection.

  • Use probability sampling so that sampling error can be estimated. Any use of nonprobability sampling methods (e.g., cut-off or model-based samples) must be justified statistically and be able to measure estimation error.
  • The sample design should include:
    • Identification of the sampling frame and the adequacy of the frame,
    • The sampling unit used (at each stage if multistage design),
    • Criteria for stratifying or clustering,
    • Sampling strata,
    • Sample size by stratum,
    • Expected yield by stratum,
    • Sample selection procedures,
    • The known probability (or probabilities) of selection,
    • Estimated efficiency of sample design,
    • Power analyses to determine sample sizes and effective sample size for key variables by reporting domains (where appropriate),
    • Response rate goals (Guideline 4.5.3),
    • Estimation and weighting plan,
    • Variance estimation techniques appropriate to the sample design,
    • Expected precision of estimates for key variables, and
    • References for the sampling methods used.
  • For nonprobability sample designs, include a detailed selection process and demonstrate that units not in the sample are impartially excluded on objective grounds.
  • Discuss potential nonsampling errors, including reporting errors, response variance, measurement bias, nonresponse, imputation error, and errors in processing the data. Indicate steps to be taken to minimize the effect of these problems on the data.

Related Information

Bureau of Transportation Statistics. 2004. Confidentiality Procedures Manual. Washington, DC.

__________. 2005. BTS Statistical Standards Manual, Section 3.2 (Frame Maintenance and Updates). Washington, DC. Available at http://www.bts.gov/programs/statistical_policy_and_research/bts_statistical_standards_manual/index.html as of July 29, 2005.

Cochran, W.G. 1977. Sampling Techniques, 3rd ed. New York: Wiley.

Office of Management and Budget (OMB). 2004. Questions and Answers When Designing Surveys for Information Collection. Washington, DC. December 6.

__________. 2005. Standards for Statistical Surveys (Proposed), Section 1.2 (Survey Design) and Section 2.1 (Developing Sampling Frames). Washington, DC. July 14.

Srndal, C.-E., Swensson, B., and Wretman, J. 1991. Model Assisted Survey Sampling. New York: Springer Verlag.

U.S. Department of Transportation (DOT). 2002. The Department of Transportation Information Dissemination Quality Guidelines, Chapter 2 (Planning Data Systems). Washington, DC. Available at http://dms.dot.gov/ombfinal092502.pdf as of January 19, 2005.

Wolter, K.M. 1985. Introduction to Variance Estimation. New York: Springer Verlag.

Approval Date: August 15, 2005

2.3 Data Collection Methods

Standard 2.3: The design and planning for data collection must include:

  • The detailed methods to be used to collect data,
  • The data collection instruments and associated instructions,
  • A pretest for new data collection systems, or existing systems with major revisions, and
  • Plans for the dissemination of major resulting information products to the public.

Key Terms: bias, bridge study, collection instrument, confidentiality, crosswalk, key variable, measurement error, response rate

Guideline 2.3.1: Methods of Obtaining Data

The data collection method should be appropriate to the nature, amount, and complexity of the data requested, the number of data providers, available resources, and the amount of time available.

  • Determine the method, or combination of methods, of data collection (e.g., mail, telephone, Internet, etc.) that is appropriate for the target population and the objectives of the data program. The determination should include consideration of the likely effect of method choice on response rates.
  • Establish a data collection period that allows sufficient response time for data providers to supply reliable data, including time to follow up on missing data, and meets the required dissemination schedule.
  • Develop a plan for confidentiality protection (BTS 2004) during sampling, data collection, processing, data analysis, and dissemination.
  • Develop plans for data processing, including data editing and imputation (BTS 2005, Chapter 4).
  • Plan for quality assurance during each phase of the data collection process to permit monitoring and assessing the performance during implementation. Include contingencies to modify the procedures if critical requirements (e.g., for the response rate) are not met.
  • Establish a formal training process for persons involved in interviewing, observing, or reporting data to ensure that the intended procedures are followed.
  • If redesigning an existing data system, analyze and document the potential impact of changes in key variables or data collection procedures.
  • Plan for evaluating data collection and processing procedures, results, and potential biases.
  • Develop general specifications for an internal project management system for the complete data collection cycle that identifies critical activities and key milestones that will be monitored, and the time relationships among them.

Guideline 2.3.2: Instruments and Instructions

Design the data collection instrument in a manner that maximizes data quality, while minimizing respondent burden:

  • Do not use instrument formats that are inappropriate for the method of data collection. For example, if using a self-administered collection instrument, limit skip patterns to ease navigation.
  • Develop clearly written instructions to help reporters minimize missing data and measurement error.
  • Require that data items are clearly defined in terms the reporters understand, with entries in a logical sequence and with reasonable visual cues and instrument formatting (if applicable). Pretest to identify problems with interpretability.
  • Structure the order and presentation of data items such that responses do not unduly influence responses to subsequent items.
  • Minimize the number of data calculations and conversions the reporter must make.
  • For computer-assisted and other forms of electronic data collection (using GPS devises, sensors, etc.):
    • Test for validity and reliability under conditions similar to those of the planned data collection.
    • Develop protocols for the backup and recovery of data.
    • If possible, have alternate methods of data collection available in case of equipment failure. Otherwise, develop plans to impute or adjust for faulty or missing observations.
  • Establish protocols that minimize measurement error, such as conducting response analysis surveys that ensure records exist for data elements requested for business data collections, establishing recall periods that are reasonable for personal data collections, and developing computer systems that ensure internet data collections function properly.

Guideline 2.3.3: Standard Codes and Classifications

To allow data comparisons across databases, use standard names, variables, numerical units, codes, and definitions. Use codes and classifications consistent with the federal coding standards listed below, if applicable. If a federal coding standard does not exist, consult with subject area experts to determine if applicable non-federal standards exist. Provide crosswalk tables to the federal standard codes for any legacy coding that does not meet the federal standards. These codes are updated periodically. Current federal standard codes include:

  • FIPS Codes. The National Institute of Standards and Technology (NIST n.d.) maintains Federal Information Processing Standards (FIPS) required for use in federal information processing in accordance with OMB Circular A-130. The following FIPS should be used for coding:
    • 5-2, Codes for the Identification of the States, the District of Columbia and the Outlying Areas of the United States, and Associated Areas.
    • 6-4, Counties and Equivalent Entities of the U.S., Its Possessions, and Associated Areas.
    • 10-4, Countries, Dependencies, Areas of Special Sovereignty and Their Principal Administrative Divisions.
  • Statistical Areas. OMB (2005b) defines Metropolitan Statistical Areas, Micropolitan Statistical Areas, Combined Statistical Areas, and New England City and Town Areas for use in Federal statistical activities. These areas, as well as principal cities, are updated annually to reflect changes in population estimates.
  • NAICS Codes. The North American Industry Classification System (NAICS) should be used to classify establishments (U.S. Census Bureau n.d.). NAICS was developed jointly by the United States, Canada, and Mexico to provide new comparability in statistics about business activity across North America. (NAICS coding replaced the U.S. Standard Industrial Classification (SIC) system.)
  • SOC Codes. The Standard Occupational Classification (SOC) system (BLS 2000) should be used to classify workers into occupational categories for the purpose of collecting, calculating, or disseminating data.
  • Race and Ethnicity. Classification of race and ethnicity, as well as methods of collection, should comply with OMBs Standards for Maintaining, Collecting, and Presenting Federal Data on Race and Ethnicity (OMB 2000).
  • Aviation. The International Air Transport Association, an airline industry association, establishes standard codes for airlines and airport locations (IATA n.d.). The BTS Office of Airline Information also develops and maintains Aviation Support Tables (BTS n.d.) that provide standard codes and other information for air carriers (U.S. and foreign), worldwide airport locations, and for aircraft types and models. The BTS codes do not always agree with IATA coding.
  • Standard Classification of Transported Goods (SCTG) Reporting System Codes. The SCTG coding system (Statistics Canada n.d.) was created by the U.S. and Canadian governments, and is used to address statistical needs regarding the transportation of products.
  • United Nations (UN) Numbers and North American (NA) Numbers. UN numbers are four digit numbers used worldwide to identify different hazardous materials. The UN numbers are developed through the framework of the United Nations Model Regulations on the Transport of Dangerous Goods. NA numbers are assigned by the U.S. and Canada to hazardous materials that have not been assigned a UN number. The PHMSA Office of Hazardous Materials Safety (PHMSA n.d.) maintains a consolidated table of hazardous materials codes and information.
  • Injury Codes. The International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) (NCHS n.d.) is the official system of assigning codes to diagnoses and procedures associated with hospital utilization in the United States. The E-codes in this manual are for injuries. Transportation related injuries span from E800 to E848.
  • Human Factors Codes. The FAA Office of Aviation Medicine (FAA 2000) uses The Human Factors Analysis and Classification System—HFACS.

Guideline 2.3.4: Pretesting

For new data collections or major revisions of ongoing collections, all components must be pretested so that they minimize measurement error and function as intended prior to full implementation.

  • One component of pretesting is a pilot test in which some components of a data collection can be pretested prior to a field test of the data collection (for example, using focus groups, cognitive laboratory work, and or calibration studies).
  • Another component of pretesting is a field test. Components of a data collection that cannot be successfully demonstrated through previous work should be field tested prior to implementation of the full-scale data collection. The design of a field test should reflect realistic conditions, including those likely to pose difficulties for the data collection.

Guideline 2.3.5: Proposed Data Analysis and Information Products

Develop a dissemination agenda that identifies proposed major information products, timing of release, and their target audiences.

  • Proposed data analysis should identify issues, objectives, and key variables, and be linked to the questions the data collection was intended to answer.
  • Develop adjustment methods, such as crosswalks and bridge studies that will be used to preserve trend analyses and inform users about the impact of changes.

Related Information

Bureau of Labor Statistics (BLS). 2000. Standard Occupational Classification (SOC) System. Available at http://www.bls.gov/soc/ as of November 15, 2004.

Bureau of Transportation Statistics (BTS). n.d. Aviation Support Tables. Office of Airline Information: Washington, DC. Available at http://www.transtats.bts.gov/Tables.asp?DB_ID=595&DB_Name=Aviation%20Support%20Tables&DB_Short_Name=Aviation%20Support%20Tables as of July 20, 2005.

__________. 2004. Confidentiality Procedures Manual. Washington, DC.

__________. 2005. BTS Statistical Standards Manual, Chapters 3-6. Washington, DC. Available at http://www.bts.gov/programs/statistical_policy_and_research/bts_statistical_standards_manual/index.html as of July 29, 2005.

Energy Information Administration (EIA). 2002. EIA Standards Manual, Standard EIA 2002-5 (Frames Development and Maintenance) and Standard 2002-4 Supplementary Materials, Forms Design Checklist. Washington, DC. Available at http://www.eia.doe.gov/smg/Standard.pdf as of January 25, 2005.

Federal Aviation Administration (FAA). 2000. The Human Factors Analysis and Classification System—HFACS. DOT/FAA/AM-00/7. Office of Aviation Medicine: Washington, DC. Available at http://www.hf.faa.gov/Portal/ShowProduct.aspx?ProductID=54 as of June 15, 2005.

International Air Transportation Association (IATA). n.d. Airline Coding Directory. London, UK. Available at http://www.iata.org/ps/publications/9095.htm as of July 26, 2005.

National Center for Health Statistics (NCHS). n.d. The International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM). Available at http://www.cdc.gov/nchs/about/otheract/icd9/abticd9.htm as of June 14, 2005

National Institute of Standards and Technology (NIST). n.d. Federal Information Processing Standards Publications. Available at http://www.itl.nist.gov/fipspubs/index.htm as of November 15, 2004.

Office of Management and Budget (OMB). 2000. Provisional Guidance on the Implementation of the 1997 Standards for Federal Data on Race and Ethnicity. Available at http://www.whitehouse.gov/omb/inforeg/statpolicy.html#dr as of November 15, 2004.

__________. 2004. Questions and Answers When Designing Surveys for Information Collection. Washington, DC. December 6.

__________. 2005a. Standards for Statistical Surveys (Proposed), Section 3.3 (Coding). Washington, DC. May 19.

__________. 2005b. Update of Statistical Area Definitions and Guidance on Their Uses. Available at http://www.whitehouse.gov/omb/inforeg/statpolicy.html#ms as of July 15, 2005.

Pipeline and Hazardous Materials Safety Administration (PHMSA). n.d. Hazmat Table. Office of Hazardous Material Safety: Washington, DC. Available at http://www.myregs.com/dotrspa/ as of July 20, 2005.

Presser, S., Rothgeb, J.M., Couper, M.P., Lessler, J.T., Martin, M., Martin, J., and Eleanor Singer. 2004. Methods for Testing and Evaluating Survey Questionnaires. New York: Wiley.

Statistics Canada. n.d. Standard Classification of Transported Goods (SCTG). Ottawa, Canada. Available at http://www.statcan.ca/english/Subjects/Standard/sctg/sctg-intro.htm as of June 14, 2005.

Stopher, P. and Jones, P., eds. 2003. Transport Survey Quality and Innovation. Oxford, UK: Pergamon.

Sudman, S., Bradburn, N., and Schwarz, N. 1996. Thinking about Answers: The Application of Cognitive Processes to Survey Methodology. San Francisco: Jossey-Bass.

U.S. Census Bureau. n.d. The North American Industry Classification System (NAICS). Washington, DC. Available at http://www.census.gov/epcd/www/naics.html as of November 15, 2004.

U.S. Department of Transportation (DOT). 2002. The Department of Transportation Information Dissemination Quality Guidelines, Chapter 2 (Planning Data Systems). Washington, DC. Available at http://dms.dot.gov/ombfinal092502.pdf as of January 19, 2005.

Approval Date: August 15, 2005

2.4 Documents and Documentation

Standard 2.4: Planning activities must include the documentation of user needs and design decisions as well as the preparation of required administrative documents.

Key Terms: coverage, frame, target population

Guideline 2.4.1: Documentation of Data Needs

After establishing the data needs and requirements, prepare a detailed technical document that describes the goals and objectives of the data collection, including:

  • A summary of the consultations with major data users and data providers, plus any other sources consulted,
  • The information needs that will be met, including the desired accuracy, timeliness, and dissemination format(s) for the data, and
  • The choices made for meeting data needs and their relationship to the requirements.

Guideline 2.4.2: Target Population and Frames Documentation

Describe the target populations and associated frames (lists of population units) in detail. Include a discussion of coverage issues (Guideline 2.2.1).

Guideline 2.4.3: Sample Design Documentation

If sampling is part of the data collection design, prepare a detailed description of the sample design (Guideline 2.2.2) and how it will yield the data required to meet the objectives of the data collection. When a nonprobabilistic sampling method is employed, the survey design documentation should include:

  • A discussion of what options were considered and why the final design was selected,
  • An estimate of the potential bias in the estimates, and
  • The methodology to be used to measure estimation error.

Guideline 2.4.4: Collection and Processing Methodology Documentation

Document the collection design and its connection to the data requirements (Section 2.3). The documentation should include the methods of obtaining data, copies of the data collection instrument and instructions, pretest design and findings, and plans for disseminating the results of the data collection to the public.

Guideline 2.4.5: Administrative Documents

Comply with the following requirements as part of the data collection planning and design:

  • When planning and design is in its initial stages, prepare a project plan specifying schedules and resource requirements in the format specified by BTS management.
  • Data collections (and related activities such as focus groups, cognitive interviews, pilot studies, field tests, etc.) are all collections of information subject to the requirements of the Paperwork Reduction Act of 1995 (P.L. 104-13, 44 U.S.C. 3501 et seq.) and OMBs regulations (5 CFR Part 1320, Controlling Paperwork Burdens on the Public). OMB approval is required before the agency may collect information from ten or more persons outside the Federal government in a twelve-month period. The documentation specified in this section can all be used in Part B of the submission to OMB (OMB 2004a)
  • Projects that require a new IT investment or significant modification of an existing IT investment must go through the Capital Planning and Investment Control process.
  • Contracts should include language stating that the contractor shall comply with all standards and guidelines contained in the BTS Statistical Standards Manual and the BTS Confidentiality Procedures Manual.

Related Information

Bureau of Transportation Statistics. 2004. Confidentiality Procedures Manual. Washington, DC.

__________. 2005. BTS Statistical Standards Manual. Washington, DC. Available at http://www.bts.gov/programs/statistical_policy_and_research/bts_statistical_standards_manual/index.html as of July 29, 2005.

Office of Management and Budget (OMB). 2004a. Paperwork Reduction Act Submission (Form OMB 83-I). Washington, DC. February. Available at http://www.whitehouse.gov/omb/inforeg/83i-fill.pdf as of June 15, 2005.

__________. 2004b. Questions and Answers When Designing Surveys for Information Collection. Washington, DC. December 6.

U.S. Department of Transportation (DOT). 2002. The Department of Transportation Information Dissemination Quality Guidelines, Chapter 2 (Planning Data Systems). Washington, DC. Available at http://dms.dot.gov/ombfinal092502.pdf as of January 19, 2005.

Approval Date: August 15, 2005