You are here
Chapter 2. Weight Construction
Chapter 2. Weight Construction
Weights are needed to produce valid population-level estimates so that the results of a survey of the population are representative of the population as a whole. Adjustment and poststratification are performed on collected data to reduce bias of estimates. Poststratification reweights the data so that the characteristics of the respondents are the same as the characteristics of the population.
Initial Pre-9/11 and Post-9/11 Person Data Sets
The 2001 NHTS person-level data file was separated into two data sets: the Pre-9/11 group of respondents and the Post-9/11 group of respondents, based on the assigned day of September 25, 2001. A person interviewed before and on 9/25/01 would be in the Pre-9/11 group, while a person interviewed after 9/25/01 would be in the Post-9/11 group. The rationale behind selecting September 25, 2001 as the cutoff is explained below. The separation of the 2001 NHTS person-level data file is only in the sense of time frame. All information or variables including the weights were kept intact.
Data were weighted at the person level so that the survey respondents and their demographics and characteristics would accurately represent the characteristics and demographics of the population at large.
For example, if in the survey 47% of the respondents were male (not accurate, an example), and in the U.S. population 49% of the population was known to be male, then the male survey respondents would be weighted stronger so that their data and travel information would count for 49% of the populations travel patterns. Likewise the female respondents data would be weighted less than the males, so that their 53% of the survey results would accurately represent the 51 % of the population that are female.
The number of respondents in the Pre-9/11 group was 22,204, while the Post-9/11 group had 38,078 respondents. In addition, the composition of the two groups was different. To see whether the Pre- and Post-9/11 groups were representative samples from the national population we constructed population control totals for key characteristics that were related to key survey variables. The control totals were independently obtained by adjusting the Census 2000 numbers for growth between 2000 and September 2001, which was the midpoint of the survey. The control totals used for reweighting the Pre-9/11 data set are given in Table 1 of Appendix B while the control totals for the Post-9/11 data set are given in Table 2 of Appendix B. We saw differences in estimates of the number of Hispanics in the United States., the number of Blacks in the United States, and other variables, from population control totals. To make each of the Pre-9/11 group and the Post-9/11 group a nationally representative sample, we post stratified each group to population control totals.
ReweightingQuestions and Answers
Q: Why do we need to reweight the new data sets?
A: The original survey was weighted by the different variables over the entire 14-month period to correspond with the population at large. By splitting the data up, the original weights cannot be used in either subsample to accurately represent the population at large. (For example, in the first half of the survey 48.39% of the interviewed were males which is less than the 48.80% over the entire study of the survey respondents, but in the second half 49.11% of respondents were male. For this reason our initial weight, which calculates to 48.80% for males, would not be an accurate weight for either subsample.) This same principle applies to the respondents and their other characteristics before and after 9/11. The weights used over the entire study were representative of the characteristics of respondents over the entire study, but not of the Pre-9/11 respondents characteristics or of the Post 9/11 respondents characteristics. The whole is equal to the sum of the parts, but in this case, the parts were not equivalent. The respondents characteristics, before and after 9/11 were not equivalent and therefore the responses needed to be reweighted.
Q: Why was September 25, 2001 chosen as the cutoff for the Pre- and Post- 9/11 data set?
A: There is no way to divide the long-distance trip data by the exact date of September 11, 2001, and then adjust the corresponding weights: a. Because it is impossible to adjust the weights directly at the long-distance trip level, we have to adjust the weights of the Pre-911 and Post-911 data files at the person level first and then incorporate the person-level weights into the long-distance trip data files; b. The NHTS did not collect long-distance trip data from all household members for the entire data collection period (March 2001 May 2002). Instead, people were asked to report their long-distance trips for a 4-week period prior to and including their randomlyassigned travel day. Because a cross-sectional sample of people was interviewed throughout the data collection period, these 4-week travel period reference periods are spread out across the data collection period. This means that most survey respondents were interviewed only about their trips either before or after Septmber 11, 2001 (with a small number of people whose reference periods spanned September 11). This made it necessary for us to choose a date to divide those whose reference period was before September 11 and those whose reference period was after September 11 (and evenly divide the group whose reference period includes September 11). The assigned travel day of 9/25/2001 was chosen as the cutoff point to achieve this: The respondents who interviewed on the assigned travel day of 9/25 were asked about all long-distance trips taken on that day and the preceding 28 days. So respondents would give information about all trips they took between and including August 29, 2001, and September 25, 2001, and 9/11 is the exact midpoint of the time interval. This means that some long-distance trips taken after 9/11 are in the Pre-9/11 long-distance trip data file and also some long-distance trips taken before 9/11 are in the Post-9/11 long-distance trip data file. For assessing the effect of 9/11 one can delete trips aken after 9/11 by persons in the Pre-9/11 group by using the variable TPBOA911 (travel period before on or after 9/11). Similarly, one can delete trips taken on or before 9/11 by Post-9/11 persons easily.
Chapter 5 in this documentation will explain how to deal with these cases in more detail.
Adjustment for Data Splitting
After the data set was divided, the sum of the weights left both in the Pre-9/11 and Post-9/11 person data file were not equal to the population total. To adjust for this, the weights are multiplied by factors for each of the two data files.
Factor = Population total/ Sum of the original weights, where
Population total = 277,208,169
The next step in recalculating the useable person weight was to match survey estimates to independent controls for various demographic categories, in a process called raking. In this study controls are used to match the survey respondents measured characteristics to what is known about the occurence of the characteristics in the population at large, so that respondents weights will represent accurately the travel patterns of the entire population.
There are eight dimensions used in the raking process. The dimensions are: race, ethnicity, race by month, ethnicity by month, sex by age, census region, MSA status, and month by day of the week. These dimensions were chosen because the Pre- and Post-9/11 groups were different from the population in terms of these dimensions and we had available control totals for these variables from Census 2000. These control totals were constructed separately for the Pre- and Post-9/11 groups by adjusting Census estimates for growth between 2000 and 2001, when the majority of data collection on the NHTS was done, by using estimates from the Census Bureau's Current Population Survey.
Weights were first adjusted to assure agreement on the first raking dimension, then weights were adjusted for the second raking dimension, then for the third, etc. This process was repeated, again and again, assuring agreement with each of the raking dimensions. The process continued, with iterative controlling for each variable, until simultaneously close agreement for each of the variables was obtained. The raking process was done separately for both the Pre-9/11 and Post-9/11 data files.
The variables and the control totals are provided in appendix D, along with the average adjustment factor for each category.
A final step was to trim very large weights, which were a byproduct of the raking process. Inordinately large weights tend to substantially increase sampling variance. By keeping weights small, sampling variance is reduced, although there is some loss in bias reduction, which was due to the adjustment and raking process. Trimming is only used to reduce large weights, not for editing data in any way.
Trimming was performed so that the maximum weight a response could have was four times the mean weight of all the respondents. The weights which were more than four times the mean of weights were trimmed to equal a maximum of four times the mean weight. After trimming large weights, the raking process was then repeated so that survey estimates would still agree with the control total. This trimming process was performed twice, separately for both the pre-9/11 and post-9/11 data files.
Long-distance Trip Data Sets
The 2001 NHTS long-distance trip data file was basically divided into two parts. Any trip taken by persons whose assigned travel day was before or on the cutoff point of September 25th, belongs to the pre-9/11 long-distance trip data file. Otherwise, the trips fall into the post-9/11 long-distance trip data files. This means that some long-distance trips taken after 9-11 are in the pre-9/11 long-distance trip data file, and also some long-distance trips taken before 9/11 are in the post-9/11 long-distance trip data file. Chapter 5 in this documentation will explain a way to deal with these cases.
Long-distance Trip Weights
The person-level weights were incorporated into long-distance trip data files by merging with the long-distance trip data by house ID and person ID. Long-distance trip weights are simple functions of the person weights described above, modified only for the purpose of producing annual estimates of the number of person trips. The long-distance trips were recorded in a 28-day period. The long-distance trip weight is simply equal to the final person-level weight multiplied by 365/28.