You are here

3. Sampling Weights and Adjustments

3. Sampling Weights and Adjustments

This section discusses the development of survey weights. Two types of weights were used in the present survey: pre-population adjustment weights (to correct for unequal selection probabilities) and post-stratification (to correct for known discrepancies between the sample and the population). The final analysis weight reflects both types of adjustments, i.e., adjustment for non-response, multiple telephone lines, and persons per household as well as post-stratification adjustments. The final analysis weight is the weight that should be used for analyzing the survey data.

The final analysis weight was developed using the following steps:

  • Calculation of the base sampling weights;
  • Adjustment for unit non-response;
  • Adjustment for households with multiple voice telephone numbers;
  • Adjustment for selecting an adult within a sampled household; and
  • Post-stratification adjustments to the target population.

The product of the above variables represents the final analysis weight. If needed, extreme values of the final analysis weight can be reduced (or trimmed) using standard weight trimming procedures.

3.1 Base Sampling Weights

The first step in weighting the sample is to calculate the sampling weight for each telephone number in the sample. The sampling weight is the inverse of the telephone number's probability of selection:

Uppercase w subscript lowercase s equals uppercase n divided by lowercase n where uppercase w subscript lowercase s is the sampling weight for each telephone number in the sample, uppercase n is the total number of telephone numbers in the population and lowercase n is the total number of telephone numbers in the sample.

Where N is the total number of telephone numbers in the sampling frame and n is the total number of telephone numbers in the sample. For this survey, the total number of telephone numbers in the sampling frame, N, is 282,271,600 for the national survey and 69,120,100 for the survey of targeted MSAs. The total number of telephone numbers in the sample (numbers dialed) is 5,350 for the national survey and 4,229 for the survey of targeted MSAs, which eventually included 2,830 cases in the original sample of targeted MSAs and 1,399 cases that were sampled for the national survey and were from the nine targeted MSAs.

3.2 Adjustment for Unit Non-Response

For the national survey, sampled telephone numbers are classified as responding or non-responding households according to Census division and metropolitan status (inside or outside a Metropolitan Statistical Area). The non-response adjustment factor for all telephone numbers in each Census division (c) by metropolitan status (s), is calculated as follows:

Uppercase adj subscript uppercase nr equals 1 divided by uppercase casro subscript lowercase response rate left parenthesis lowercase c, lowercase s right parenthesis where uppercase adj subscript uppercase nr is the non-response adjustment factor within each Census division/metropolitan status combination and uppercase casro subscript lowercase response rate left parenthesis lowercase c, lowercase s right parenthesis is the response rate for Census division c and metropolitan status s

Where the denominator is the CASRO response rate for Census division c and metropolitan status s. The non-response adjustment factor for a specific cell (defined by metropolitan status and Census division) is a function of the response rate, which is given by the ratio of the estimated number of telephone households to the number of completed surveys. For the survey of targeted MSAs, the cell for calculating the non-response adjustment factor is each of the nine targeted MSAs.

The non-response adjusted weight (WNR) is the product of the sampling weight (WS) and the non-response adjustment factor (ADJNR) within each stratum.

3.3 Adjustment for Households with Multiple Telephone Numbers

Some households have multiple telephone lines for voice communication. Thus, these households have multiple chances of being selected into the sample, and adjustments must be made to their survey weights. The adjustment for multiple telephone lines follows:

Uppercase adj subscript uppercase mt equals 1 divided by the minimum of left parenthesis number of telephone lines in a household, 3 right parenthesis where uppercase adj subscript uppercase mt is the adjustment for households with multiple telephone numbers

The adjustment is limited to a maximum factor of three. In other words, the adjustment factor ADJMT will be one over two (0.50) if the household has two telephone lines, and one over three (0.33) if it has three or more.

Table 3 provides the summary statistics for the number of telephone lines in the sampled households.

Table 3: Number of Telephone Lines per Household

  National MSA
Mean 1.04 1.063
Standard error of mean 0.007 0.01
Minimum 1 1
25th percentile 1 1
Median 1 1
75th percentile 1 1
Maximum 4 4

For respondents who did not provide this information, it is assumed that the household contained only one telephone line. The non-response adjusted weight (WNR) is multiplied by the adjustment factor for multiple telephone lines (multiple selection probability) (ADJMT) to create a weight that is adjusted for non-response and for multiple selection probability (WNRMT).

3.4 Adjustment for Number of Eligible Household Members

The probability of selecting an individual respondent depends on the number of eligible respondents in the household. Therefore, it is important to account for the total number of eligible household members when constructing the sampling weights. The adjustment for selecting a random adult household member follows:

ADJRA = Number of Eligible Household Members

Table 4 provides the summary statistics for the number of eligible members in the sampled households.

Table 4: Number of Eligible Household Members

  National MSA
Mean 2.325 2.36
Standard error of mean 0.056 0.067
Minimum 1 1
25th percentile 2 2
Median 2 2
75th percentile 3 3
Maximum 9 7

For respondents who did not provide this information, a value for ADJRA is imputed according to the distribution of the number of eligible persons in a household (from responding households) within the age, gender, and race/ethnicity cross-classification cell matching that of the respondent for which the value is being imputed.

The weight adjusted for non-response and for multiple selection probability (WNRMT) is then multiplied by ADJRA, resulting in WNRMTRA, a weight adjusted for non-response, multiple selection probability and for selecting a random, household member.

3.5 Post-Stratification Adjustments

Adjusting weighted survey counts so that they agree with population counts provided by the Census Bureau can compensate for different response rates by demographic subgroups, increase the precision of survey estimates, and reduce the bias in the estimates due to the exclusion of households without telephones from sampling. The final adjustment to the survey weight is a post-stratification adjustment that allows the weights to sum to the target population (i.e., U.S. non-institutionalized persons 18 years of age or older) by age, gender, and race/ethnicity.

The outcome of post-stratification is a factor or multiplier (M) that scales WNRMTRA within each age/gender/race cell, so that the weighted marginal sums for age, gender, and race/ethnicity agree with the corresponding Census Bureau distribution for these characteristics. The method used in the post-stratification adjustment is a simple ratio adjustment applied to the sampling weight using the appropriate national population total for a given cell defined by the intersection of age, gender, and race/ethnicity.2 The general method for ratio adjusting follows:

  • A table of the sum of the weights for each cell denoted by each age, gender, and race/ethnicity combination is created. Each cell is denoted by S(i,j,k), where i is the indicator for age, j is the indicator for gender, and k is the indicator for race/ethnicity.
  • A similar table of national population controls is created, where each cell is denoted by P(i,j,k).
  • The ratio R(i,j,k) = P(i,j,k) / S(i,j,k) is calculated; the cell ratio R(i,j,k) is denoted as the multiplier M.
  • Each weight, at the record level, is multiplied by the appropriate cell ratio of R(i,j,k) to form the post-stratification adjustment.

For the national sample, cells used in the post-stratification are defined by the combination of age, gender, and race/ethnicity.3 Some race/ethnicity or, preferably, age categories may be merged if the number of completed interviews within the corresponding cells falls below 30. For this survey, many of the cells have less than 30 observations. After grouping and to remain consistent with what was done in previous surveys, a total of 16 cells are used for the national sample and 10 for the sample of targeted MSAs. For the sample of targeted MSAs, cells for post-stratification are defined only by the combination of gender and age due to the lack of information on race/ethnicity. The details are in the following two tables.

Table 5: Post-Stratification Cells - National

CELL DESCRIPTION SAMPLE SIZE POPULATION
1 Male - Hispanic (age 18 and over) 37 16,025,259
2 Male - Black, non-Hispanic (age 18 and over) 24 12,295,956
3 Male - White, non-Hispanic (age 18-34) 26 21,569,336
4 Male - White, non-Hispanic (age 35-44) 35 13,569,404
5 Male - White, non-Hispanic (age 45-54) 75 15,668,930
6 Male - White, non-Hispanic (age 55-64) 73 12,513,255
7 Male - White, non-Hispanic (age 65 and over) 123 13,329,864
8 Male - Other race, non-Hispanic (age 18 and over) 54 6,918,128
9 Female - Hispanic (age 18 and over) 46 14,825,817
10 Female - Black, non-Hispanic (age 18 and over) 52 14,196,535
11 Female - White, non-Hispanic (age 18-34) 35 20,862,430
12 Female - White, non-Hispanic (age 35-44) 60 13,496,575
13 Female - White, non-Hispanic (age 45-54) 86 15,909,704
14 Female - White, non-Hispanic (age 55-64) 91 13,100,051
15 Female - White, non-Hispanic (age 65 and over) 169 17,908,073
16 Female - Other race, non-Hispanic (age 18 and over) 69 7,494,516
N/A Missing demographic information 27  
  TOTAL 1,082 229,683,833

Table 6: Post-Stratification Cells - MSA

CELL DESCRIPTION SAMPLE SIZE POPULATION
1 Male - age 18-34 30 8,289,508
2 Male - age 35-44 50 5,471,778
3 Male - age 45-54 66 5,305,946
4 Male - age 55-64 53 3,742,602
5 Male - age 65 and over 91 3,625,639
6 Female - age 18-34 55 8,072,874
7 Female - age 35-44 63 5,526,391
8 Female - age 45-54 100 5,512,983
9 Female - age 55-64 79 4,137,051
10 Female - age 65 and over 120 5,083,986
N/A Missing demographic information 13  
  TOTAL 720 54,768,758

Those respondents who did not supply the demographic information necessary to categorize their age, gender, and/or race/ethnicity are excluded from the post-stratification process and assigned a value of one for M.

The multiplier M is then applied to WNRMTRA to create WNRMTRAPS. However, WNRMTRAPSis overstated because a portion of the sample is not included in the calculation of the post-stratification adjustment. Therefore, a deflation factor is applied to the value of WNRMTRAPS. The deflation factor DEF for the national sample is calculated as follows:

Uppercase def equals summation from i equals 1 to 5 summation from j equals 1 to 2 summation from k equals 1 to 4 uppercase p left parenthesis i, j, k right parenthesis divided by uppercase tw subscript uppercase ntmtrana plus summation from i equals 1 to 5 summation from j equals 1 to 2 summation from k equals 1 to 4 uppercase p left parenthesis i, j, k right parenthesis where uppercase def is the deflation factor, uppercase p left parenthesis i, j, k right parenthesis is  the national population count for cell left parenthesis i, j, k right parenthesis and uppercase tw subscript uppercase ntmtrana is the sum of uppercase w subscript uppercase nrmtra weights for respondents with missing demographic information

Where:

P(i, j, k) is the national population count for cell (i, j, k); and

TWNRMTRA_NA is the sum of the WNRMTRA weights for respondents with missing demographic information.

The deflation factor DEF for the sample of targeted MSAs is calculated as follows:

Uppercase def equals summation from i equals 1 to 5 summation from j equals 1 to 2 uppercase p left parenthesis i, j, right parenthesis divided by uppercase tw subscript uppercase ntmtrana plus summation from i equals 1 to 5 summation from j equals 1 to 2 uppercase p left parenthesis i, j, right parenthesis where uppercase def is the deflation factor, uppercase p left parenthesis i, j, right parenthesis is the national population count for cell left parenthesis i, j, right parenthesis and uppercase tw subscript uppercase ntmtrana is the sum of uppercase w subscript uppercase nrmtra weights for respondents with missing demographic information

Where:

P(i, j) is the MSA population count for cell (i, j); and

TWNRMTRA_MSA is the sum of the WNRMTRA weights for respondents with missing demographic information.

This deflation factor denotes the proportion of the target population represented by respondents with non-missing demographic information. The final analysis weight, WFINAL, is the scaled value of WNRMTRAPS, calculated as follows:

WFINAL = DEF x WNRMTRAPS

WFINAL can be viewed as the number of population members that each respondent represents.

3.6 Trimming of Final Analysis Weights

Extreme values of WFINAL are trimmed to avoid over-inflation of the sampling variance. In short, the trimming process limits the relative contribution of the variance associated with the kth unit to the overall variance of the weighted estimate by comparing the square of each weight to a threshold value determined as a multiple of the sum of the squared weights. Letting w1, w2,...wj, denote the final analysis weights for the n completed interviews, the threshold value is calculated using the following formula:

Threshold equals the square root of left parenthesis 10 times the summation from j equals 1 to lowercase n lowercase w subscript lowercase j squared divided by lowercase n right parenthesis where lowercase w subscript lowercase j is the final analysis weight and lowercase n is the number of completed interviews

Each household having a final analysis weight that exceeds the determined threshold value is assigned a trimmed weight equal to the threshold. Next, the age/gender/race cell used in the post-stratification is identified for each household with a trimmed weight. To maintain the overall weighted sum within the cell, the trimmed portions of the original weights are reassigned to the cases whose weights are unchanged in the trimming process.

For cases having trimmed weights but missing age, gender, and/or race/ethnicity information, the trimmed portions of the original weights are assigned to all remaining cases whose weights are unchanged in the trimming process.

The entire trimming procedure is repeated on the new set of weights - a new threshold value is recalculated and the new extreme values are re-adjusted. The process is repeated until no new extreme values are found.

2 The Census Bureau provides a detailed breakdown of population count by age, gender, and race/ethnicity.

3The four race/ethnicity categories used for post-stratification purposes are: Hispanic (any race), Black, non-Hispanic, White, non-Hispanic, and Other, non-Hispanic.