## Using Bayesian Updating to Enhance 2001 NHTS Kentucky Sample Data for Travel Demand Modeling

## Using Bayesian Updating to Enhance 2001 NHTS Kentucky Sample Data for Travel Demand Modeling

**BING MEI ^{1} *
THOMAS A. COONEY ^{2}
NIELS R. BOSTROM ^{3}**

### ABSTRACT

This paper investigates the utility of the 2001 National Household Travel Survey Kentucky standard and add-on samples for statewide, rural county, and small urban area travel demand modeling. The weaknesses of the Kentucky standard sample for deriving trip rates and average trip lengths are identified, which include greater uncertainty caused by a small sample size and suspiciously low trip rates for urban clusters (urban areas with less than 50,000 population). We show that the Kentucky add-on sample can be used to enhance the Kentucky standard sample for developing trip rates and average trip lengths. Combining the two samples using Bayesian updating resulted in improved trip rates and average trip lengths.

KEYWORDS: National Household Travel Survey, surveys, Kentucky, trip generation, trip distribution, Bayes' theorem, data transferability, travel demand model, transportation planning.

### INTRODUCTION

The objective of this research was to evaluate the utility of the 2001 National Household Travel Survey (NHTS) Kentucky samples, including both standard and add-on, for rural county, small urban area, and statewide travel demand modeling in Kentucky. Specifically, trip rates by trip purpose and by area type derived from the samples were examined by comparison with other samples constructed based on the 2001 NHTS national sample. Average trip lengths were also analyzed in the same manner.

The 2001 NHTS was conducted as an update to and integration of the earlier Nationwide Personal Transportation Survey, which focused on short daily household trips, and the American Travel Survey, which focused on long trips. Approximately 26,000 households were surveyed nationwide for the NHTS. The NHTS survey data include characteristics of households, people, and vehicles, as well as detailed information on daily and long-distance travel for all purposes and by all modes.^{1}

The NHTS is designed to collect data from a nationally representative sample of households in order to provide statistically accurate travel estimates at the national level. Sample data in the NHTS are not intended to be adequate for statewide or area-specific estimates. As a result, the Kentucky Transportation Cabinet (KYTC) participated in the NHTS add-on program initiated by the U.S. Department of Transportation to obtain more household travel data within Kentucky. Under the add-on program, an additional 1,154 households were surveyed in four counties in Kentucky (KYTC 2002). The primary purpose of conducting the Kentucky add-on survey was to achieve a larger sample size suitable for revising travel demand model parameters for rural county, small urban area, and statewide modeling.

### DATA

#### 2001 NHTS Kentucky Standard Sample

The Kentucky households in the 2001 NHTS national sample were selected to form a separate sample, which is called the Kentucky*standard sample* in this paper to distinguish it from the Kentucky *add-on sample.* Because the NHTS was designed to collect data from a nationally representative sample of households to provide statistically valid estimates at the national level, the number of households in each state is relatively small. There were only 390 Kentucky households in the national sample, of which 338 had completed trip reports for all household members. These 338 households made 2,785 motorized trips on their assigned travel days, including both weekdays and weekends. Of these, 378 (13.6%) were home-based work trips, 1,546 (55.5%) were home-based other trips, and 861 (30.9%) were nonhome-based trips. Due to the inclusion of weekend trips, home-based work trips account for a lower percentage than that commonly reported for weekday travel demand modeling.

As a fairly common modeling practice, different trip rates are used for different area types. Since the census area-type classification of urbanized area, urban cluster, and rural area has been incorporated into the NHTS dataset, this study employed these area types. The U.S. Census Bureau (USDOC 2000) defines an *urban cluster* as a densely settled area that has a population of 2,500 to 49,999, while an *urbanized area* is defined as a densely settled area that has a population of at least 50,000. Both urban clusters and urbanized areas generally consist of a geographic core of block groups or blocks that have a population density of at least 1,000 people per square mile, adjacent block groups and blocks with at least 500 people per square mile, and less densely settled blocks that form enclaves or indentations, or are used to connect discontiguous areas with qualifying densities. *Rural* consists of all territory, population, and housing units located outside of urbanized areas and urban clusters (USDOC 2000).

Following census definitions, 140 of the 338 Kentucky standard sample households were in urbanized areas, 63 in urban clusters, and 135 in rural areas. These relatively small sample sizes, especially for the urban clusters, raise concerns about the reliability of the survey results for estimating Kentucky-specific trip rates by area type and purpose. Trip rates by area type and trip purpose derived from this sample are presented later with a detailed set of tables.

#### 2001 NHTS Kentucky Add-On Sample

For the Kentucky add-on sample, an additional 1,154 households were randomly selected and surveyed between June 2001 and June 2002 in Carter, Edmonson, Pulaski, and Scott Counties in Kentucky. KYTC classified these four counties as typical non-urbanized counties. Figure 1 shows the location of these counties, while table 1 summarizes the distribution of add-on sample households by area type. Daily household trip information was collected in the sample. Of the 1,154 households, 335 were located in urban clusters, 819 in rural areas, and none were located in urbanized areas. All 2,828 persons in these households reported their travel day trips, which included 9,710 motorized person-trips in total on both weekdays and weekends. Broken down by trip purpose, there were 1,249 (12.9%) home-based work trips, 5,347 (55.1%) home-based other trips, and 3,114 (32%) nonhome-based trips.

#### Other Datasets

At the time the NHTS survey was conducted, Kentucky was part of the East South Central Census Division and had an overall population of approximately 4.1 million, with 7 metropolitan statistical areas (MSAs) completely or partially located within the state. Two of the 7 MSAs had a population over 1 million and fell into the MSA size category of 1 million to 3 million population. Populations of the others ranged from 100,000 to 500,000. There were no MSAs with a population of 3 million or more.

To check the reasonableness of the trip rates and average trip lengths derived from the Kentucky standard and add-on samples, several other datasets with larger sample sizes were constructed by selecting relevant households from the 2001 NHTS national sample. These datasets are collectively called *non-Kentucky samples* in this paper and were constructed as follows:

**Households nationwide with the exclusion of those in MSAs with a population of 3 million or more.**We refer to this sample as the*national sample.*A total of 15,443 households that made 141,769 daily motorized trips fell into this category.**Households in the East South Central Census Division excluding the state of Kentucky.**This division included Alabama, Mississippi, Tennessee, and Kentucky. No MSAs had a population of 3 million or more in this area. A total of 902 households that made 8,348 daily motorized trips fell into this category. We refer to this sample as the*East South Central sample.***Households in selected states surrounding Kentucky, excluding those in MSAs with 3 million residents or more.**The selected states included Tennessee, Missouri, Illinois, Indiana, and Ohio. We refer to this sample as the*surrounding-states sample.*The resulting dataset consisted of 2,904 households reporting a total of 27,638 motorized trips.**Households in states with similar socioeconomic characteristics (in terms of household annual income and household size) to Kentucky, excluding those in MSAs of 3 million residents or more.**This sample is referred to as the*similar-SE-states sample*and consisted of 2,781 households reporting 25,838 motorized trips.

To select states for the similar-SE-states sample, distributions of household annual income and household size of candidate states were compared with those of Kentucky. Only those with similar distributions were included in the sample, which finally consisted of Alabama, Arkansas, Iowa, Louisiana, Missouri, Mississippi, Oklahoma, South Carolina, and Tennessee. Collectively, the household income and size distributions of the similar-SE-states sample were relatively close to those of the Kentucky standard sample.

### ANALYSIS

#### KY Standard Sample Trip Rates and Average Trip Lengths

**Trip Rates**

The NHTS dataset includes weights to expand the sample data to the U.S. population. For this study, household person-trip rates were developed with both unweighted and weighted survey data. Only motorized trips were included; bicycle, walk, and other nonmotorized trips were excluded from the data. Both weekday and weekend trips were included. To reduce the effect of smaller sample sizes on the accuracy of estimates, trip rates assessed by statistical tests were only classified by trip purpose (home-based work, home-based other, and nonhome-based) and by area type (urbanized area, urban cluster, and rural area). More detailed classifications of trip rates (e.g., by household size, number of vehicles owned, and/or household income) were not attempted in this study.

The statistical comparison of trip rates from different data sources was made using the common *t*-statistic of the difference of two means. Trip rates for each of the samples are displayed in tables 2, 3, 4, 5, 6.

Table 2 shows that the Kentucky standard sample produced the lowest all-purpose household trip rate (8.53 weighted, 8.24 unweighted). The *t*-tests indicate a significant difference exists between the Kentucky standard sample rate and those of the national, the East South Central, the surrounding-states, and the similar-SE-states samples at the 0.05 level of significance. By area type, Kentucky standard sample rates are also lower than these rates, especially in the urban cluster category where 5.73 weighted and 6.16 unweighted trips were generated on average. These rates are significantly different from all other non-Kentucky rates at the 0.05 level of significance. Because there were only 63 households in this category, the Kentucky urban cluster rates may not be statistically reliable.

Trip rates by trip purpose and by area type are shown in tables 3 through 6. As seen from the tables, the Kentucky standard sample rates agree with the non-Kentucky sample values relatively well for the home-based work purpose for all area types; no significant difference was observed at the 0.05 level of significance. For home-based other trips, the Kentucky standard sample rates are also close to the others in urbanized areas and rural areas; again, no significant difference was observed at the 0.05 level of significance. However, large differences exist for the urban clusters, where the Kentucky standard sample rates are lower than non-Kentucky rates by approximately 30% to 40% and are significantly different from all of them at the 0.05 level of significance. More significant differences exist for the nonhome-based trips. Except for the urban cluster category, the Kentucky standard sample nonhome-based trip rate is also much lower than and significantly different from all non-Kentucky rates, both weighted and unweighted, for all area types. Again, this overall significant difference is probably due to the low rates in the urban cluster category.

**Average Trip Lengths**

Average motorized person-trip lengths by trip purpose were calculated for the national sample, the similar-SE-state sample, and the Kentucky standard sample and included both weekday and weekend trips. Table 7 presents the average trip lengths from the three datasets. It was found that the Kentucky standard sample produced longer trips than the national and similar-SE-states samples for all trip purposes. The *t*-test shows that the home-based other and nonhome-based trip lengths from the Kentucky standard sample are significantly different from those in the national and similar-SE-states samples at the 0.05 level of significance. This is not surprising, because Kentucky has large rural areas where trip lengths tend to be longer than those in nonrural areas. However, due to the small sample size, the average trip lengths from the Kentucky standard sample had larger standard errors of the mean (discussed in greater detail later).

#### Kentucky Add-On Sample Trip Rates and Average Trip Lengths

**Trip Rates**

As introduced above, the Kentucky add-on sample was randomly collected from four counties in the state. Two of the four counties (Carter and Scott) are in MSAs with 300,000 and 500,000 million populations, respectively. Although the sample is not statewide and lacks households in urbanized areas, it still partially reflects the socioeconomic and travel characteristics of Kentucky residents. This, along with a larger sample size, makes the add-on data an appealing source of additional information. Trip rates by county, as well as in total, developed from the add-on sample are shown in table 8.

All-county trip rates are also presented in tables 2 through 6 along with rates from the other samples. As can be seen in table 2, the urban cluster all-purpose trip rates from the add-on sample are much higher than those from the Kentucky standard sample: 8.81 vs. 6.16 for unweighted and 9.31 vs. 5.73 for weighted. In rural areas, the rates from the two samples are close to each other. Compared with the all-purpose trip rates from the non-Kentucky samples, the Kentucky add-on sample rates agree relatively well in urban clusters. However, they are slightly lower in rural areas and more similar to the Kentucky standard sample rates.

Broken down by trip purpose, add-on sample home-based work rates are in good agreement with rates from all other samples. The home-based other trip rates from the add-on sample match well in urban clusters but are slightly lower in rural areas. The add-on sample nonhome-based trip rates are all lower than non-Kentucky samples but higher than the Kentucky standard sample, especially in urban clusters (tables 5 and 6).

Based on the above observations, the Kentucky add-on sample overall appears to provide more reasonable information than does the Kentucky standard sample for urban clusters and rural areas.

**Average Trip Lengths **

Average trip lengths by county and trip purpose from the Kentucky add-on sample are shown in table 9. Edmonson County produced the longest trips for all trip purposes and Pulaski County produced the shortest trips on average. This travel pattern was found to be consistent with area development patterns. Edmonson County is the most rural of the four counties and Pulaski the most urban. However, they all produced longer trips than the areas included in the national and similar-SE-states samples, as shown in table 7. They even produced longer home-based work and home-based other trips than the Kentucky standard sample. Because statewide models, as well as county and small urban area models, are typically used for studying travel demand and transportation systems in rural areas, the add-on sample provides useful data for developing those models.

### DATA TRANSFER AND BAYESIAN UPDATING

Based on the above analysis, the advantages and disadvantages of both the Kentucky standard sample and the add-on sample can be summarized as follows:

- The Kentucky standard sample is representative of the entire state, but due to its small sample size large uncertainty exists with the trip rates and average trip lengths derived from the sample.
- The Kentucky add-on sample produces statistically more reliable trip rates and average trip lengths, but due to the way the data were collected the sample does not represent the whole state.

Therefore, data updating techniques were considered to integrate the two data sources into a new set of data, which takes advantage of the Kentucky standard sample in reflecting travel characteristics statewide and the Kentucky add-on sample in providing greater certainty with data values derived from it.

#### Literature Review

While the transferability of travel demand model parameters, especially those of discrete choice models, has been studied extensively for years (Atherton and Ben-Akiva 1976; Badoe and Miller 1995), it appears that the transferability of transportation planning data has not been much investigated, even though transportation professionals have been using transferred travel data for many years in many places. One of the most typical examples may be the application of national averages of trip rates, trip length distributions, etc., as default values (NCHRP 1998) in the development of travel demand models for small to medium-sized urban areas, where funding was not available for collecting local-specific travel data. In this case, an underlying assumption that people may not be aware of is that those national average values are assumed to be transferable to the study area. If those data are applied to the study area without any adjustments, the transfer is considered a *full* transfer. However, in order to produce reasonable model validation statistics, a common practice for model developers is to adjust the initially adopted national data based on "professional judgment." With adjustments, the data transfer is considered a *partial* transfer. The most critical aspect of a partial transfer is the transfer methodology.

Wilmot and Stopher (2001) conducted research on the transferability of transportation planning data. They stated that disaggregate data at the trip level are very context-specific and therefore intrinsically untransferable, but aggregate data that express the general travel behavior of individuals collectively have a much better chance of being transferable. In their study, they used Bayesian updating to update national averages of trip rates, trip length frequency distributions, and mode shares with local data derived from a small travel survey conducted in Baton Rouge, Louisiana. The updated values and the national averages were then compared with the data values derived from the 1995 Nationwide Personal Transportation Survey Baton Rouge add-on sample, which is much larger in size than the updating sample.

Wilmot and Stopher (2001) found that: 1) transportation planning data (e.g., trip rates, mode shares, and trip length distributions) at certain aggregate levels can be transferred from multiple sources and combined into a single set of updated data; and 2) the data created by updating the transfer data with a small sample of local data were found to be improved over the *fully* transferred data, which were the national averages in their study. They also found that Bayesian updating appears to be a feasible method for data transfer and updating. With Bayesian updating, the influence of all contributory data sources is incorporated into the newly created data. Thus, our study used Bayesian updating to combine the Kentucky standard sample with the Kentucky add-on sample to develop a set of new trip rates and average trip lengths.

#### Theory of Bayesian Updating

The method of Bayesian updating is based on Bayes' Theorem, which has been widely used for statistical inference (Berry and Lindgren 1990). It starts with prior information and a measure of certainty regarding the prior information. When new sample data are available they are incorporated with the prior into a new answer, which is also called the posterior. With more sample data, the uncertainty regarding the new answer diminishes and the following answers improve.

Since both the prior and the updating sample are normally distributed and the variance is known, the mean and variance of the posterior can be expressed as a function of the mean and variance of the prior and the updating data in the functional form as shown in the equations below (Atherton and Ben-Akiva 1976). This functional form is known as a normal-normal conjugate prior. The posterior produced with this function is also normally distributed.

and

where

*θ _{i}* is the mean of data item

*i;*and

*σ*^{2}_{i} is the variance of the mean of data item *i.*

Subscription meaning: *updated* represents the updated dataset; *prior* represents the prior dataset; and *updating* represents the dataset used for updating the prior dataset. As a note, since transfer data serve as the prior in data updating, the terms "prior" and "transfer data" are used interchangeably in this paper.

In the above equations, data item values from the data sources are weighted by the inverse of their variance to achieve a value for the updated data item. This feature is appealing, because data values with greater certainty (i.e., with smaller variance) contribute more to the estimate of the updated data item than those with less certainty (i.e., with larger variance).

When the prior data are reliable, a relatively small sample can be used for updating. However, in the cases where the prior data are not very reliable, a relatively large updating sample is more likely needed. In both cases, the variance of the posterior data will always be less than that of both the prior and the updating sample.

Bayesian updating has been studied and used in the past to update the parameters of travel demand models and the method has been reported to perform well (Atherton and Ben-Akiva 1976). Wilmot and Stopher's study (2001) appears to be the first one that applied this technique in a data transferability study. They reported that updating of transfer data with local information using Bayesian updating seems to improve transfer data consistently.

#### Bayesian Updating for Kentucky Samples

This study adopted the Bayesian updating method tested by Wilmot and Stopher (2001) and emphasized the strength of the method in combining contributory datasets. The Kentucky standard sample data was utilized as the prior and the add-on sample data as the updating data in the process. The two datasets were combined and a new improved dataset was produced in which the advantages of one input dataset compensated for the disadvantages of the other input dataset.

**Updated Trip Rates **

We combined the Kentucky standard sample with the Kentucky add-on sample using Bayesian updating, and the updated trip rates were compared with the trip rates from the non-Kentucky samples to investigate the improvement. The updated trip rates are shown in tables 10 and 11 along with the rates from the Kentucky standard sample, the Kentucky add-on sample, and the non-Kentucky samples. These tables show that the updated trip rates are improved overall. For instance, compared with the trip rates from the similar-SE-states sample, in 10 of the 12 mean cells, updated trip rates are closer than the Kentucky standard sample rates. In particular, the values for urban clusters are improved substantially and look more reasonable. Only two rates appear to deviate more, though slightly. From the data uncertainty perspective, all updated trip rates have much lower standard errors of the mean than the rates from the Kentucky standard sample, which indicates that uncertainty in the updated trip rates has been substantially reduced.

**Updated Average Trip Lengths**

The major issue identified earlier concerning the average trip lengths from the Kentucky standard sample was the relatively large standard errors of the means, which indicate less confidence with the means. To achieve lower standard errors, the Kentucky standard sample average trip lengths were combined with the Kentucky add-on sample values using Bayesian updating. Table 12 presents the updated average trip lengths along with their standard errors. As stated earlier, the standard error (or variance) of the posterior data produced by the Bayesian updating process is always less than that of both the prior and the updating sample. Therefore, as expected, for unweighted average trip lengths, the standard error of the mean decreased from 1.692 to 0.422 for home-based work trips, from 0.690 to 0.289 for home-based other trips, and from 0.685 to 0.388 for nonhome-based trips. Similar patterns were also observed for the weighted trip lengths. While the process reduced uncertainty in the average trip lengths, the values of the updated average trip lengths stayed close to both the Kentucky standard sample and the add-on sample values but are longer than the national averages and the similar-SE-states sample values. This may reflect the actual trip length characteristics of Kentucky.

### CONCLUSIONS

Based on the above analysis, the following conclusions are drawn:

- The 2001 NHTS Kentucky standard sample is partially useable for travel demand modeling, although its sample size is not large and it produces unreasonably low trip rates for urban clusters.
- The Kentucky add-on sample produces statistically more reliable trip rates and average trip lengths although it is only partially representative of Kentucky.
- The Kentucky add-on sample can be used to enhance significantly the NHTS Kentucky standard sample for statewide, rural county, and small urban area travel demand modeling. The trip rates generated by combining the Kentucky add-on sample rates with the Kentucky standard sample rates using Bayesian updating showed substantial and consistent improvements. The Bayesian updating process also improved average trip lengths by reducing uncertainty in the data.
- The findings of this study support the conclusions drawn by Wilmot and Stopher (2001) that transportation planning data can be improved by combining two or more reasonable datasets into one, and Bayesian updating seems to be a feasible method to combine or update transportation planning data.

### NOTE

At the time this study was conducted, Mr. Mei and Mr. Cooney were with Wilbur Smith Associates and Mr. Bostrom was with the Kentucky Transportation Cabinet.

### REFERENCES

Atherton, T.J. and M.E. Ben-Akiva. 1976. Transferability and Updating of Disaggregate Travel Demand Models. *Transportation Research Record*610:1218.

Badoe, D.A. and E.J. Miller. 1995. Analysis of the Temporal Transferability of Disaggregate Work Trip Mode Choice Models. *Transportation Research Record*1493:111.

Berry, D.A. and B.W. Lindgren. 1990. *Statistics: Theory and Methods.* Pacific Grove, CA: Brooks/Cole Publishing Co.

Kentucky Transportation Cabinet (KYTC). 2002. 2001 NHTS Add-On Program: Final Report and Data Codebook. October.

National Cooperative Highway Research Program (NCHRP). 1998. *Travel Estimation Techniques for Urban Planning,* NCHRP Report 365. Washington, DC: National Academy Press.

U.S. Department of Commerce (USDOC), U.S. Census Bureau. 2000. Geographic Terms and Concepts. Available at http://www.census.gov.

Wilmot, C.G. and P.R. Stopher. 2001. Transferability of Transportation Planning Data. *Transportation Research Record*1768:3643.

### END NOTES

1. For more information, see the 2001 NHTS website: http://nhts.ornl.gov/2001/index.shtml.

### ADDRESSES FOR CORRESPONDENCE

Corresponding author: B. Mei, Institute for Transportation Research and Education, North Carolina State University, 909 Capability Drive, Suite 3600, Raleigh, NC 27606. E-mail: bmei@ncsu.edu

T. Cooney, Pima Association of Governments, 177 N. Church Ave, #405, Tucson, AZ 85701. E-mail: tCooney@pagnet.org

N. Bostrom, Wilbur Smith Associates, 465 East High Street, Suite 100, Lexington, KY 40507. E-mail: rbostrom@wilbursmith.com