You are here
Chapter 3Collection of Data
Collection of Data
Data collection includes all the processes involved in implementing a planned design to acquire data. Common types of data collection include:
- Regulatory data collections (e.g., the airline traffic data required by 14 CFR 241),
- Administrative data collections (e.g., the border crossing data), and
- Surveys (e.g., the Commodity Flow Survey).
In cases where BTS conducts or sponsors the data collections, BTS has control over the collection process. BTS also uses data from external sources. In these cases, BTS has little or no control over the data collection. External-source data vary in importance for BTS use. Some data, such as the border crossing data, BTS both disseminates and uses in further analyses. BTS uses other external-source data only incidentally in analysis reports.
This chapter contains standards for acquiring data from external sources (Section 3.1), maintaining the frame (list of the target population) (Section 3.2), conducting data collection operations (Section 3.3), and documenting the data collection process (Section 3.4). Except for the guideline on confidentiality protection (Guideline 3.3.4), only Section 3.1 is required for incidentally used external data.
Standard 3.1: Data that BTS acquires from external sources must be evaluated and understood in order to assess the quality for the intended BTS use.
Key Terms: confidentiality, external source
Guideline 3.1.1: Obtaining External Data.
Obtain the highest quality version of the external data that is available from the source.
- Verify that the data set is the latest version, and that no corrected or revised data are available for the current or previous time periods. Keep a backup copy of the data.
- Obtain the most complete data documentation available for the corresponding time periods. Acquire any available documentation that can be used to assess data quality.
- Evaluate data from external sources for data quality before deciding whether the data are appropriate for the intended BTS use. The level of BTSs evaluation effort should depend on the thoroughness of the external sources quality control and on the importance of the data for the intended use.
Guideline 3.1.2: Confidential External Data
If the external data are confidential or proprietary, written agreements to acquire the data must stipulate the confidentiality requirements for protecting it.
Bureau of Transportation Statistics. 2005. BTS Statistical Standards Manual, Chapter 2 (Data Collection Planning and Design) and Chapter 4 (Processing of Data). Washington, DC.
Office of Management and Budget. 2002. Guidelines for Ensuring and Maximizing the Quality, Objectivity, Utility, and Integrity of Information Disseminated by Federal Agencies, Public Comments and OMB Response (Applicability of Guidelines). Federal Register, Vol. 67, No. 36, pp. 8453-8454. Washington, DC. February 22.
U.S. Department of Transportation (DOT). 2002. The Department of Transportation Information Dissemination Quality Guidelines, Appendix A, Section 1.3 (Applicability). Washington, DC. Available at http://dms.dot.gov/ombfinal092502.pdf as of January 19, 2005.
Approval Date: April 20, 2005
Standard 3.2: Frames (lists of potential data providers) must be maintained, updated, evaluated, and archived to ensure that coverage is as complete and current as possible.
Key Terms: administrative data collection, bias, coverage, frame, regulatory data collection, target population
Guideline 3.2.1: Maintaining Coverage
Frames must be maintained and updated.
- Maintenance is the continuous revision of the frame based on new information that becomes available during data collection. For regulatory or administrative data collections, frame maintenance requires that changes related to reporting eligibility are promptly reflected in the data collection system.
- Updates are systematic, comprehensive searches for frame changes that canvass all available information. Updates can also include re-examination of reporting categories using more recent information, such as reclassifying airlines based on annual operating revenues.
Maintenance and updating actions include:
- Additions of new potential data providers,
- Revisions due to changes in ownership, name, or address.
- Changes in how data providers are classified (for reporting or sampling purposes), and
- Deletions of data providers no longer in the target population.
Guideline 3.2.2: Coverage Evaluation
In addition to routine maintenance and updates, periodically evaluate target population coverage of frames that are used for recurring data collections.
- The frequency of coverage evaluations depends on the relative stability of the target population and on the frequency of data collection.
- Evaluate coverage of administrative or regulatory data collections at least annually.
- If the frame is properly maintained and updated, problems in coverage for regulatory based systems can be avoided.
- Conduct an evaluation of the potential bias if the frames coverage of the target population falls below 85 percent (OMB 2005).
Guideline 3.2.3: Archiving
Frames are a critical component of data collection and documentation. A backup copy of the current frame must be created and archived prior to each major frame update (or periodically, for continuously maintained frames).
- All active and inactive data providers must be included on the archive file.
- Inactive records may be periodically deleted from the current file, after the prior file has been archived.
- During a frame update, information on potential data providers should not be deleted from the frame. Instead, a status indicator field in the frame should designate whether the entry is active/inactive or in-scope/out-of-scope.
- Whenever the information contained in a frame is modified, record the effective date of the change.
- Provide a way of tracking changes in frame record identifiers over time.
Bureau of Transportation Statistics. 2005. BTS Statistical Standards Manual, Section 2.2 (Target Population and Sample Design). Washington, DC.
Federal Committee on Statistical Methodology. 1990. Survey Coverage, Statistical Policy Working Paper 17, Washington, DC: Office of Management and Budget. Available at http://www.fcsm.gov/working-papers/wp17.html as of November 5, 2004.
__________. 2001. Measuring and Reporting Sources of Error in Surveys, Chapter 5 (Coverage Error), Statistical Policy Working Paper 31, Washington DC: Office of Management and Budget. Available at http://www.fcsm.gov/01papers/SPWP31_final.pdf as of December 20, 2004.
Office of Management and Budget (OMB). 2005. Standards for Statistical Surveys (Proposed), Section 2.1 (Developing Sampling Frames). Washington, DC. July 14.
Approval Date: April 20, 2005
Standard 3.3: Design and administer data collection methods and instruments to balance among:
- The maximization of data quality,
- The control of measurement error and bias due to missing data, and
- The minimization of respondent burden and cost.
In addition, if BTS promises confidentiality of respondents data, then BTS must protect the privacy rights of the respondents and data providers, and protect their data from unauthorized disclosure.
Key Terms: confidentiality, data collection, Information Collection Request (ICR), key variable, measurement error, nonresponse bias, response rates
Guideline 3.3.1: Quality Assurance
Develop protocols to monitor data collection activities, with strategies to identify and correct problems to ensure quality during data collection:
- Implement a process control system during data collection to monitor data quality. The quality control system should be integrated into the data collection process, and enable staff to identify and resolve problems. The control system should also provide data quality measurements for use as indicators of data collection performance and data quality. Use a data tracking process to ensure that data are not lost when transferred to BTS.
- Use a verification process in data entry to ensure entry errors remain below a set limit based on data accuracy requirements. Include data verification rules in online or other electronic data collection systems.
- Conduct refresher training periodically for persons involved in interviewing, observing, or providing data to maintain proper procedures and standards.
- Track on-going response rates and item nonresponse for key variables. Conduct an evaluation of potential item nonresponse bias if response rates (defined in Section 4.3.1) fall below 70 percent for core items (OMB 2005).
- Determine the core items to obtain when a respondent is unwilling to complete the whole information collection instrument. Target the core items to meet the minimum standard for unit response and to analyze nonresponse bias (Section 4.4).
Guideline 3.3.2: Encouraging Cooperation
To encourage data providers and respondents to participate, train data collection staff on obtaining cooperation, building rapport, and converting refusals, even for mandatory data collections. Response rates and data quality can also be improved through means such as the use of prenotification letters, multiple contacts, and reminder notices.
Guideline 3.3.3: Information Collection Request
Provide respondents with an Information Collection Request (ICR) when collecting information. The ICR is usually placed on the information collection instrument. Follow the requirements for ICRs given in the Information Collection section of the BTS Confidentiality Procedures Manual.
Guideline 3.3.4: Protecting Confidential Data
In all phases of data collection, confidential data must be protected from unauthorized access or release:
- Protect identifying information of respondents as collected or on the sample frame from unauthorized release or access.
- Ensure that controls are in place to prevent unauthorized access to electronic information collections (computer assisted interviewing, web based collections, or other electronic filing methods).
- Ensure that all data collection staff have received confidentiality training and signed a non-disclosure form prior to collecting data.
- Use secure means when handling and storing the data during collection to protect against disclosure.
- Use other means to protect confidential information as outlined in the Confidentiality Procedures Manual.
49 U.S.C. 111, as amended by the Safe, Accountable, Flexible, Efficient Transportation Equity Act: A Legacy for Users. P.L. 109-59.
Bureau of Transportation Statistics. 2004. Confidentiality Procedures Manual. Washington, DC.
__________. 2005. BTS Statistical Standards Manual, Section 2.3 (Data Collection Methods) and Chapter 4 (Processing of Data). Washington, DC.
Groves, R. 1989. Survey Errors and Survey Costs. New York, NY: Wiley, Chapters 10 and 11.
Office of Management and Budget (OMB). 2005. Standards for Statistical Surveys (Proposed), Section 2.3 (Data Collection Methodology). Washington, DC. July 14.
Privacy Act of 1974.
Confidential Information Protection and Statistical Efficiency Act (CIPSEA) of 2002, P.L. 107-347, Title V.
Approval Date: April 20, 2005
Standard 3.4: The data collection procedures should be documented both for internal staff reference and for the public. Documentation should be thorough enough to allow reproduction of the steps leading to the results.
Key Terms: external source, frame
Guideline 3.4.1: Documentation of External Data Sources
All data that BTS acquires from external sources must have adequate levels of documentation. Documentation for external sources should include:
- The organization providing data,
- The exact name of the data source,
- If the data were obtained from a publication, the full publication information and source for the data within the publication,
- If the data were acquired as a data file, how the file was obtained, the date obtained, and the cost (if any),
- If the data were obtained from the web, the web address and the date acquired,
- The best documentation available from the external data source on the data collection design (including sampling, if used), the data collection and processing procedures, any analysis or modeling performed, and any evaluations of the data quality,
- Information on whether the external data are confidential or proprietary, and if so, a copy of the written agreement used to obtain the data,
- Any additional notes on the interpretation and use of the data,
- Any personal communications required to obtain the data, information about the data source, or information about data quality, and
- Contact information for further questions.
Guideline 3.4.2: Frame Maintenance Documentation
Documentation for maintaining and updating frames must be written and revised as necessary. The documentation must include:
- The frequency of routine maintenance and major updates,
- Sources of information used for maintenance and updates,
- Procedures for incorporating the results of the updates on all appropriate files, mailing lists, and other data collection control forms or listings
- Summary of results of the frame maintenance and updates, and
- The results of periodic coverage studies.
Guideline 3.4.3: Documentation of Data Collection Operations
The data collection operations documentation should include:
- The method of data collection (e.g., mail, telephone, Internet, etc.), including methods used to track and follow up delinquent reports,
- The data collection period, response achieved by the end of the period, and final response achieved,
- Copies of materials used in the data collection, including instructions given to data providers,
- Copies of materials used in training data collection and data provider staff,
- Schedule of data collection operations,
- Any response analysis or other validation surveys conducted for new data collection efforts,
- Quantification of response errors to the extent possible.
Office of Management and Budget. 2002. Guidelines for Ensuring and Maximizing the Quality, Objectivity, Utility, and Integrity of Information Disseminated by Federal Agencies. Federal Register, Vol. 67, No. 36, pp. 8450-8460. Washington, DC. February 22.