## You are here

## 4. Variance Estimation

## 4. Variance Estimation

The data collected in the October 2009 OHS were obtained through a complex sample design involving stratification, and the final weights were subject to several adjustments. Any variance estimation methodology must involve some simplifying assumptions about the design and weighting, and so some simplified conceptual design structures are provided in this section.

### 4.1 Variance Estimation Methodology

#### 4.1.1 Software

The software package SUDAAN^{®} (Software for the
Statistical Analysis of Correlated Data) Version 10.0.1 was used for computing
standard errors. SUDAAN^{®} is a statistical software package developed
by the Research Triangle Institute to analyze data from complex sample surveys.
SUDAAN^{®} uses advanced statistical techniques to produce robust
variance estimates under various survey design options. The software can handle
stratification and numerous adjustments associated with weighting.

#### 4.1.2 Methods

Overall, three variables, CENDIV (Census division), METRO
(metropolitan status), and FNLWGT (final analysis weights), are needed for
variance estimation in SUDAAN^{®} for the analysis of the national
survey data. Two variables, MSASTRAT (MSA) and FNLWGT (final analysis weights),
are needed for variance estimation in SUDAAN^{®} for the analysis of
the MSA survey data. The method used in the present survey utilizes the
variables CENDIV and METRO to create 18 (9 x 2) strata in the national survey
data and the variable MSASTRAT to create nine strata, a single stage selection
with replacement procedure, and the final analysis weights. This method
provides somewhat conservative standard error estimates.

Assuming a simplified sample design structure, the
following SUDAAN^{®} statements can be used (note that the data file
for the national survey must be sorted by the variables CENDIV and METRO before
using it in SUDAAN^{®}, and the data file for the MSA survey must be sorted
by the variable MSASTRAT before using it in SUDAAN^{®}):

For the national data:**PROC ... DESIGN = **STRWR;**NEST **CENDIV METRO;**WEIGHT **FNLWGT;

For the MSA data:**PROC ... DESIGN = **STRWR;**NEST **MSASTRAT;**WEIGHT **FNLWGT;

More precisely, the following code is used to produce unweighted and weighted frequency counts, percentages, and standard errors (the variable of interest here is "var1," a categorical variable with seven levels):

For the national survey data:**PROC CROSSTAB DATA = **datafile **DESIGN = **STRWR;**WEIGHT **FNLWGT;**NEST **CENDIV METRO;**SUBGROUP **var1;**LEVELS **7;**TABLE **var1;**PRINT **nsum wsum
totper setot / STYLE = nchs;**RUN**;

For the MSA data:**PROC CROSSTAB DATA = **datafile **DESIGN = **STRWR;**WEIGHT **FNLWGT;**NEST **MSASTRAT**SUBGROUP **var1;**LEVELS **7;**TABLE **var1;**PRINT **nsum wsum
totper setot / STYLE = nchs;**RUN**;

### 4.2 Degrees of Freedom and Precision

A rule of thumb for degrees of freedom associated with a
standard error is a quantity: the number of unweighted records in the dataset *minus* number of strata. Degrees of freedom for the method above fluctuate depending
on the number of records in each dataset. Generally, the dataset for the
national sample will yield degrees of freedom of around 1,000, and the dataset
for the sample of targeted MSAs will yield degrees of freedom of around 500. For
practical purposes, any degrees of freedom exceeding 120 are treated as infinite.
Thus, one can use a normal distribution instead of a *t*-distribution for the statistic.