You are here

4. Variance Estimation

4. Variance Estimation

The data collected in the October 2009 OHS were obtained through a complex sample design involving stratification, and the final weights were subject to several adjustments. Any variance estimation methodology must involve some simplifying assumptions about the design and weighting, and so some simplified conceptual design structures are provided in this section.

4.1 Variance Estimation Methodology

4.1.1 Software

The software package SUDAAN® (Software for the Statistical Analysis of Correlated Data) Version 10.0.1 was used for computing standard errors. SUDAAN® is a statistical software package developed by the Research Triangle Institute to analyze data from complex sample surveys. SUDAAN® uses advanced statistical techniques to produce robust variance estimates under various survey design options. The software can handle stratification and numerous adjustments associated with weighting.

4.1.2 Methods

Overall, three variables, CENDIV (Census division), METRO (metropolitan status), and FNLWGT (final analysis weights), are needed for variance estimation in SUDAAN® for the analysis of the national survey data. Two variables, MSASTRAT (MSA) and FNLWGT (final analysis weights), are needed for variance estimation in SUDAAN® for the analysis of the MSA survey data. The method used in the present survey utilizes the variables CENDIV and METRO to create 18 (9 x 2) strata in the national survey data and the variable MSASTRAT to create nine strata, a single stage selection with replacement procedure, and the final analysis weights. This method provides somewhat conservative standard error estimates.

Assuming a simplified sample design structure, the following SUDAAN® statements can be used (note that the data file for the national survey must be sorted by the variables CENDIV and METRO before using it in SUDAAN®, and the data file for the MSA survey must be sorted by the variable MSASTRAT before using it in SUDAAN®):

For the national data:
PROC ... DESIGN = STRWR;
NEST CENDIV METRO;
WEIGHT FNLWGT;

For the MSA data:
PROC ... DESIGN = STRWR;
NEST MSASTRAT;
WEIGHT FNLWGT;

More precisely, the following code is used to produce unweighted and weighted frequency counts, percentages, and standard errors (the variable of interest here is "var1," a categorical variable with seven levels):

For the national survey data:
PROC CROSSTAB DATA = datafile DESIGN = STRWR;
WEIGHT FNLWGT;
NEST CENDIV METRO;
SUBGROUP var1;
LEVELS 7;
TABLE var1;
PRINT nsum wsum totper setot / STYLE = nchs;
RUN;

For the MSA data:
PROC CROSSTAB DATA = datafile DESIGN = STRWR;
WEIGHT FNLWGT;
NEST MSASTRAT
SUBGROUP var1;
LEVELS 7;
TABLE var1;
PRINT nsum wsum totper setot / STYLE = nchs;
RUN;

4.2 Degrees of Freedom and Precision

A rule of thumb for degrees of freedom associated with a standard error is a quantity: the number of unweighted records in the dataset minus number of strata. Degrees of freedom for the method above fluctuate depending on the number of records in each dataset. Generally, the dataset for the national sample will yield degrees of freedom of around 1,000, and the dataset for the sample of targeted MSAs will yield degrees of freedom of around 500. For practical purposes, any degrees of freedom exceeding 120 are treated as infinite. Thus, one can use a normal distribution instead of a t-distribution for the statistic.