Given the collection design, the next phase in the data acquisition process
is the collection process itself. This collection process can be a one-time
execution of a survey, a monthly (or other periodic) data collection, a continuous
reporting of incident data, or a compilation of data already collected by one
or more third parties. The physical details of carrying out the collection are
critical to making the collection design a reality.
3.1 Data Collection Operations
- The collection "instruments" are forms, questionnaires, automated
collection screens, and file layouts used to collect the data. They consist
of sets of questions or annotated blanks on paper or computer that request
information from data suppliers. They should be designed to maximize communication
to the data supplier.
- Data collection includes all the processes involved in carrying out the
data collection design to acquire data. Data collection operations can have
a high impact on the ultimate data quality.
- The data collection method should be appropriate to the data complexity,
collection size, data requirements, and amount of time available.
Examples: A reporting collection will rely partially on the required reporting
process, but will also follow-up for missing data. Similarly, a large survey
requiring a high response rate will often start off with a mail out, followed
by telephone contact, and finally by a personal visit.
- Specific data collection environmental choices can significantly affect
error introduced at the collection stage.
For example, if the data collector is collecting as a collateral duty or is
working in a uncomfortable environment, it may adversely affect the quality
of the data collected. Also, if the data are particularly difficult to collect,
it will affect the data quality.
- Conversion of data on paper to electronic form (e.g., key entry, scanning)
introduces a certain amount of error which must be controlled.
- Third party sources of data may introduce some degree of error in their
- Collection instruments are clearly defined for data suppliers, with entries
in a logical sequence, reasonable visual cues, and limited skip patterns.
Instructions should help minimize missing data and response error.
- A status tracking procedure should be used to ensure that data are not
lost in mailings, file transfers, or collection handling. A tracking system
for incoming third-party data should ensure that all required data are received.
- Data entry of paper forms should have a verification process ensuring that
data entry errors remain below set limits based on data accuracy requirements.
For example, the verification samples of key entry forms can be based on an
average outgoing quality limit for batches of forms. A somewhat more expensive
approach would be 100 percent verification.
- Make the data collection as easy as possible for the collector.
- If interviewers or observers are used, a formal training process should
be established to ensure proper procedures are followed.
- Data calculations and conversions at the collection level should be minimized.
For example, if a bus driver is counting passengers, they should not be doing
calculations such as summations. The driver should record the raw counts and
calculations should be performed where they are less likely to result in mistakes.
- The collection operation procedures should be documented and clearly posted
with the data, or with disseminated output from the data. If third party
data collection is used, procedures used by the third party should be provided
- Federal Committee on Statistical Methodology. 1983. Approaches to Developing
Questionnaires. Washington, DC: U.S. Office of Management and Budget
(Statistical Policy Working Paper 10).
- Groves, R. 1989. Survey Errors and Survey Costs. New York, NY:
Wiley, Chs. 10 & 11.
3.2 Missing Data Avoidance
- Some missing data occur in almost any data collection effort. Unit-level
missing data occur when a report that should have been received is completely
missing or is received and cannot be used (e.g., garbled data, missing key
variables). Item-level missing data occur when data are missing for one or
more items in an otherwise complete report.
For example, for an incident report for a hazardous material spill, unit-level
missing data occur if the report was never sent in. It would also occur if
it was sent in, but all entries were obliterated. Item-level missing data
would occur if the report was complete, except it did not indicate the quantity
- The extent of unit-level missing data can sometimes be difficult to determine.
If a report should be sent in whenever a certain kind of incident occurs,
then non-reporters can only be identified if crosschecked with other data
sources. On the other hand, if companies are required to send in periodic
reports, the previous period may provide a list of the expected reporters
for the current period.
Both can also be true for item-level missing data. For example, in a travel
survey asking for trips made, forgotten trips would not necessarily be known.
- Some form of missing data follow-up will dramatically reduce the incidents
of both unit-level and item-level missing data.
For example, a process to recontact the data source can be used, especially
when critical data are left out. A series of recontacts may be used for unit
nonresponse. Incident reporting collections can use some form of cross-check
with other data sources to detect when incidents occur, but are not reported.
- When data are supplied by a third-party data collector, some initial data
check and follow-up for missing data will dramatically reduce the incidents
of missing data.
- All data collection programs should have some follow-up of missing reports
and data items, even if the data are provided by third-party sources.
For example, for surveys and periodic reports, it is easy to tell what is
missing at any stage and institute some form of contact (e.g., mail out, telephone
contact, or personal visit) to fill in the missing data. For incident reports,
it is a little more difficult, as a missing report may not be obvious.
- For incident reporting collections where missing reports may not be easily
tracked, some form of checking process should exist to reduce missing reports.
- For missing data items the data collection owner should distinguish between:
critical items like items legally required or otherwise important items
(e.g., items used to measure DOT or agency performance).
- The missing data avoidance procedures should be documented and clearly
posted with the data, or with disseminated output from the data.
- Data collection program design documentation should address how the collection
process was designed to produce high rates of response.
- If data is collected by a third party, the data collection program documentation
should indicate how the third party deals with missing data, if that documentation
- Groves, R.M. and M.P. Couper. 1998. Nonresponse in Household Interview
Surveys. New York, NY: Wiley.