You are here
Chapter 5. Using the Data
Chapter 5. Using the Data
Appendix E provides abbreviations used in this report, key travel concepts, and a glossary of terms used in the 2001 NHTS. The Travel Concepts portion of Appendix E is primarily geared toward data users who are not familiar with household travel survey data. However, it may also be useful to transportation planning professionals because the exact usage of certain travel terms and concepts often vary by individual survey.
Weighting the Data
Chapter 2 described how the weights were constructed for the 2001 NHTS Pre-9/11 and Post-9/11 data sets. The weights reflect the selection probabilities and adjustments to account for nonresponse, undercoverage, and multiple telephones in a household. To obtain estimates that are minimally biased, these weights must be used. Tabulations without weights may be significantly different than weighted estimates and may be subject to large bias. Estimates of the totals are obtained by multiplying each data value by the appropriate weight and summing the results.
The long-distance trip data in this file cannot be used in a simple manner to produce realistic distributions of individual households or persons by number of annual trips. The survey provides the number of trips taken in a 28-day period. Thus, for example, if a person reports taking two long-distance trips in the 28-day travel period, we have no direct knowledge of how many trips the person takes in a year. A simple estimate of number of annual trips is 26 (2*365/28), but of course it is quite likely that the person will have taken fewer trips than this in a year. Similarly, if a person reports taking zero long-distance trips in the 28-day travel period, a simple estimate of number of annual trips is also zero, but of course it is quite possible that the person will have taken a few trips during the year.
Evaluating the Impact of 9/11 on Long-distance Trip Travel Pattern
The Pre-9/11 and Post-9/11 long-distance trip data files may be used to estimate the annual long-distance trips at the national level for Pre-9/11 and Post-9/11 time frames, respectively. Some long-distance trips taken after 9/11 are in the Pre-9/11 long-distance trip data file and some long-distance trips taken before 9/11 are in the Post-9/11 long-distance trip data file. An analyst may want to delete such trips from the two files. The variable that can be used for this purpose is TPBOA911 (travel period before/on or after 9/11.)
The person-level data file can be merged with corresponding long-distance trip data file, by house ID and person ID, to combine the personal information with long-distance trip information.
Calculating the Standard Error
The replicate weights may be used to calculate standard errors. Replicate variance estimation is useful because sample estimates are made by using a number of subsamples of the fully conducted survey. One then looks at the difference between each replicate sample estimate and the full sample estimate and squares the difference. Finally, one sums up the squared differences across all the replicates, with an appropriate multiplicative factor. Replicate weights in the NHTS Pre-9/11 and Post-9/11 were constructed using the delete-one Jackknife method (Wolter, K.M. 1985. Introduction to Variance Estimation. New York: Springer-Verlag). These weights can be used to calculate standard error estimates using WesVar or SUDAAN. Standard error estimates can also be easily calculated using the following formula:
where x is the full sample estimate (calculated by using the full sample weights), REP(i) is the estimate calculated by using the replicate weights, and the summation over the index i is from 1 to 99. For a brief introduction to delete-one jackknife variance estimation and the above formula see Raj, D. and Chandhok, P. K. 1998. Sample Survey Theory. London, U.K.: Narosa Publishing House.
Data File Conventions
A number of conventions are followed throughout the NHTS data files. Some of these are also listed in Appendix B, Codebook, and they include:
- Yes/no questions - coded as 1 = yes; 2 = no or 1=yes; 0=no.
- Calendar dates - Multiple variables contain these dates, and usually the year and month are shown as YYYYMM (year followed by the month).
- Times - All reported time variables are in military time as 0000 to 2359.
- Reserve codes - On the ASCII file, the reserve codes of 1, -8, -7 and 9 were used to indicate legitimate skips, dont knows, refused, and non-ascertained values.
- Legitimate skip codes - Questions intentionally skipped in the instrument were generally denoted by a -1 in the field.
- Dont know - When the respondent indicated that they did not know the response to a question, it was denoted by an -8 in the field.
- Refused - When a respondent refused to provide a response to a question, it was denoted by a -7 in the field.
- Not ascertained - When a question should have been asked of the respondent but was not (the question was not a legitimate skip (code -1) for that respondent) or the response provided did not seem correct because it failed an edit check and could not be corrected, the response was set to not ascertained. A not ascertained is denoted by a -9 in the field.
- Missing information for derived variables
- If a derived variable was derived from just one primary variable, the missing values for the derived variable are identical to the primary variable and could be -1, -7, -8 or -9.
- If the derived variable was derived from multiple variables, the missing values for the derived variable are -1 or -9. That is, responses of -7, or -8 were set to -9.
- If the derived variable was not derived from a CATI variable, for example, the weight variables, then missing values are coded as follows:
- . = missing value for a numeric derived variable
- Blank = missing value for a character derived variable