Accounting for Uncertainty in Estimates of Total Traffic Volume: An Empirical Bayes Approach

Accounting for Uncertainty in Estimates of Total Traffic Volume: An Empirical Bayes Approach

Gary Davis*
University of Minnesota

Shimin Yang
The St. Paul Companies

Abstract

Predicted or estimated totals of traffic volume over one or more years are required in both highway pavement and safety engineering. While current recommended practices contain guidance on how to generate such estimates, they are less clear on how to quantify the uncertainty attached to the estimates. This paper describes an initial solution to this problem. Empirical Bayes methods are used to compute quantiles of the predictive probability distribution of the traffic total at a highway site, given a sample of daily traffic volumes from that site. Probable ranges and their associated probability values are readily found, and a point prediction of the total traffic can be obtained as the median, or 50th percentile, of the predictive distribution. The method of derivation can also be used to find the predictive density and the moments of the predictive distribution if needed. No data other than those routinely collected by statewide traffic monitoring programs are needed. A test comparing computed 90% credible intervals for annual traffic volume with the corresponding actual volume at 48 automatic traffic recorder sites showed that the actual coverage percentage was not significantly different from the nominal 90% value.

Introduction

In engineering design, it is sometimes necessary to work with variables whose values are not known with certainty. In such cases, a rational compromise between over- and under-design requires first the determination of a probable range for a variable's outcome and then a design to accommodate values in this range. Implementation hinges on an acceptably accurate assessment of this probable range because if the assessed range is too narrow, the likelihood of a failure will be unacceptably high, but if the assessed range is too broad, resources will be expended in anticipation of improbable events. In addition, the users of scientific and engineering measurements often expect an assessment of the uncertainty attached to those measurements. The need for an assessment of uncertainty is revealed in the reporting of error bounds associated with opinion survey estimates, in the U.S. Supreme Court's recommendation that judges consider "known or potential rate of error" when determining the admissibility of expert scientific testimony (Foster and Huber 1999), and in the recommendation by the American Association of State Highway and Transportation Officials (AASHTO) that the "precision and bias" attached to traffic volume measurements be assessed and reported (AASHTO 1992).

The ability to justify an uncertainty assessment becomes especially important when either particular values of an estimate or the uncertainty range itself could be used by partisans to justify or oppose a controversial policy (Sarewitz and Pielke 2000). In a discussion of scientific predictions and social policy, Stewart (2000) found it useful to distinguish between two sources of uncertainty, which he called aleatory and epistemic uncertainties. Aleatory uncertainty arises when the outcomes of interest are governed by physically random processes that are at least in principle capable of generating stable long-run relative frequencies. Epistemic uncertainty, on the other hand, arises when our knowledge of underlying states of nature is incomplete. This duality in the notion of uncertainty appears to date back to the origin of modern ideas concerning probability (Hacking 1975) and is arguably the source of the current debate between Bayesian and frequentist views on the foundations of statistics (Howson and Urbach 1993).

Estimates or forecasts of the total traffic volume on a section of road for one or more years are used in both pavement design and traffic safety analysis and are also used to generate state- and nationwide estimates of total distance traveled. These are most often computed by multiplying an estimate of the road's mean daily traffic (MDT) volume by the number of days in the desired time horizon. For example, in pavement design (AASHTO 1993) estimates of MDT for each vehicle class are multiplied by 365 to obtain estimated yearly traffic totals, which are in turn used to predict the traffic loading over a pavement's design life. In traffic safety, the traffic exposure at a road site is computed by multiplying an estimate of MDT by the total number of days over which traffic accidents have been counted. For an intersection, these traffic totals are then summed over the intersection's approaches to give the total entering vehicles. For a highway section, the traffic total is multiplied by the section's length, producing an estimate of total vehicle kilometers of travel. Clearly, aleatory uncertainty is attached to an estimate of total traffic because, even if we knew a site's MDT exactly, the estimated traffic total and the actual total would likely differ, due to the unpredictable decisions of individual travelers. Epistemic uncertainty is present when the true MDT is not known exactly but has been estimated from a sample of daily traffic counts. AASHTO's (1993) recommended pavement design method explicitly allows for aleatory uncertainty as one of the components making up the overall variation term in the pavement design equation; however, epistemic uncertainty is not addressed. In traffic safety, current practices address the uncertainty in estimated accident rates due to the random nature of accident counts but do not appear to consider the contributions of either aleatory or epistemic uncertainty when estimating exposure (see Parker 1991).

Draper (1995) has illustrated how an accounting of multiple sources of uncertainty can be accomplished using Bayesian statistical methods, and in this paper we will consider the problem of assessing the uncertainty attached to estimates or forecasts of the total traffic volume. The second section will illustrate, using a simple example, how a Bayesian approach can be used to combine the contributions of aleatory and epistemic uncertainties into one assessment. The reasoning illustrated in that section will then be applied in the next section to develop an expression for the predictive distribution of a traffic total, which can be used to compute both point and interval estimates. The fourth section will then describe an initial empirical evaluation of this estimation method, and the final section will present conclusions. The development described in the third section draws heavily on past research into statistical models for time series of daily traffic counts (Davis and Guan 1996) and on weak convergence results for sums of lognormal random variables (Marlow 1967).

Illustrating the Sources of Uncertainty in Total Traffic Estimation

Consider the problem of estimating the total traffic volume over a period of N days, using a short count collected with a portable traffic counter. Let

zt = traffic volume on day t, t=1,...,N,

lowercase z superscript {uppercase n} equals the summation from lowercase t equals 1 to uppercase n (lowercase z subscript {lowercase t}) the total traffic volume over days t=1,...,N,

zl = traffic count on the lth sample day, l=1,...,n,

lowercase z bar equals (1 divided by lowercase n) times the summation from lowercase l equals 1 to lowercase n (lowercase z subscript {lowercase l}) the sample average,

lowercase z equals [lowercase z subscript {1} to lowercase z subscript {lowercase n}],n-dimensional vector containing the sample counts.

The ultimate objective is to estimate the total traffic volume zN from the traffic count sample z. For this example, we will assume that the daily traffic counts are independent and identically distributed normal, random variables with common mean μ;  and common variance lowercase sigma superscript {2}. The mean value μ;  is the site's MDT, and we will assume that it is unknown. However, to avoid technical sidetracks, we will assume the variance of the daily volumes, lowercase sigma superscript {2}, is known. We will also assume, for simplicity, that the sampled days are not part of the time period over which the total traffic is desired. Aleatory uncertainty will be present even when the MDT is known exactly. It is straightforward to show that given μ  and lowercase sigma superscript {2}, the probability distribution of the total traffic count zN is normal with mean equal to uppercase n times lowercase mu and variance equal to uppercase n times lowercase sigma superscript {2}. But, as noted earlier, we do not generally know μ  but must estimate its value from the count sample, leading to epistemic uncertainty. If we assume that, prior to collecting our sample z, our uncertainty concerning the MDT parameter is characterized by a prior probability density function (lowercase mu), Bayes Theorem can be used to find the probability density characterizing this uncertainty after collecting the sample z. That is

function (lowercase mu given lowercase z and lowercase sigma superscript {2}) equals [function (lowercase z given lowercase mu and lowercase sigma superscript {2}) times function (lowercase mu)] divided by [integral of function (lowercase z given lowercase mu and lowercase sigma superscript {2}) times (function lowercase mu) times (derivative of lowercase mu)]

In particular, if function (lowercase mu) is noninformative in the sense of being uniformly distributed on the real line, it can be shown that the posterior uncertainty concerning μ  is characterized by a normal distribution with mean equal to the sample average and variance equal to lowercase sigma superscript {2} divided by lowercase n (Box and Tiao 1973). The joint effect of aleatory and epistemic uncertainty can then be determined by treating μ  as a nuisance parameter and integrating it out of the joint density for zN and μ

function (lowercase z superscript {uppercase n} given lowercase z and lowercase sigma superscript {2}) equals the integral from negative infinity to positive infinity [function (lowercase z superscript {uppercase n} given lowercase mu and lowercase sigma superscript {2}) times function (lowercase mu given lowercase z and lowercase sigma superscript {2}) times (derivative of lowercase mu)]

(Draper 1995). Here function (lowercase z superscript {uppercase n} given lowercase z and lowercase sigma superscript {2}) denotes the predictive probability density of the total traffic given the count sample, while function (lowercase z superscript {uppercase n} given lowercase mu and lowercase sigma superscript {2}) denotes the predictive probability density of the total traffic when the MDT is known. For this example, closed form evaluation of (2) is possible (Box and Tiao 1973), leading to the conclusion that the predictive probability density function (lowercase z superscript {uppercase n} given lowercase z and lowercase sigma superscript {2}) is normal with mean equal to theand variance given by

variance (lowercase z superscript {uppercase n} given lowercase z and lowercase sigma superscript {2}) equals [uppercase n times (uppercase n plus lowercase n) times lowercase sigma superscript {2}] divided by lowercase n equals uppercase n times lowercase sigma superscript {2} plus uppercase n superscript {2} times (lowercase sigma superscript {2} divided by lowercase n)

In this case, the contributions to the total variance attributable to aleatory and epistemic uncertainty can be separated, with uppercase n times lowercase sigma superscript {2} as variance due to aleatory uncertainty and uppercase n superscript {2} times (lowercase sigma superscript {2} divided by lowercase n) as variance due to uncertainty concerning the MDT. Interestingly, while the variance due to aleatory uncertainty increases linearly with the number of days in the traffic total, the variance due to epistemic uncertainty increases quadratically. To see the relative contributions of these sources, suppose that we seek to predict one year's total traffic volume using a 10-day sample count and that the daily traffic volume has an MDT of 1,000 vehicles per day and a coefficient of variation equal to 0.1. The day-to-day variance would be lowercase sigma superscript {2} equals (.1 times 1,000) superscript {2} equals 10,000, so the standard deviation due to aleatory uncertainty would be square root (365 times 10,000) which equals 1,910 vehicles. The standard deviation due to epistemic uncertainty would be square root (365 times (10,000 divided by 10)) equal to 11,540 vehicles. This example illustrates how epistemic uncertainty can be the dominant source of error and that neglecting its contribution can lead to a serious overstatement of a prediction's precision.

Because the main objective of this paper is to show how a more complete accounting of uncertainty can be added to current traffic monitoring practices, we describe these practices next. The chief purpose of a traffic monitoring program is to generate estimates of MDT on each of a jurisdiction's road segments. Ideally, this is done with year-round counting on each segment, but the cost of installing and maintaining such a comprehensive traffic monitoring system is prohibitive. Therefore, MDT estimates on the majority of road segments are obtained from samples gathered using portable traffic counters. Since traffic volumes vary systematically throughout the course of the year as well as across the days of the week, averages computed from short count samples are generally biased estimates of full year averages. However, if the magnitude of the bias is known, adjustments can be made. To determine these adjustments, most states employ a small number of permanent automatic traffic recorders (ATRs) placed on a representative sample of road segments. The daily traffic counts from the ATRs are used to cluster the ATRs into factor groups such that daily traffic volumes at sites in a factor group show similar seasonal and day-of-week variation patterns. The ATR counts are also used to estimate the seasonal and day-of-week factors characterizing each group. Each non-ATR road section is then assigned to one of these factor groups, and the variation factors characterizing the assigned group are used to adjust the short-count sample, providing a better estimate of the section's MDT. It is currently recommended that a suitably adjusted short count of 48 hours produces an estimate of MDT with acceptable precision (AASHTO 1992; USDOT FHWA 1995).

At least two sources of potential error can cause an estimated MDT to differ from a section's true (but unknown) MDT: sampling error, arising anytime the estimate is based on less than a complete census of the section's traffic volumes, and adjustment error, arising if the factors used to adjust the short-count sample differ from those which actually describe the sampled section's variation pattern. In a recent review of MDT estimation, Davis (1997b) pointed out that much of the earlier research used to justify the use of short counts for estimating MDT tended to underestimate the potential effect of adjustment error, and an analysis of the potential contributions from the two error sources indicated that adjustment error can plausibly be two to three times larger than sampling error. This analysis was consistent with recent empirical work by Sharma et al. (1996), which investigated the effect of adjustment errors in estimating MDT, as well as with work which highlighted the error caused by applying adjustment factors developed for traffic dominated by passenger cars to estimate the MDT of heavy trucks (Hallenbeck and Kim 1994; Cambridge Systematics 1994). The review also pointed out that both sampling and adjustment error can be explicitly accounted for within a hierarchical statistical model of the process generating the daily traffic counts and that this model can be used to develop an empirical Bayes (EB) estimator of MDT, which does not require that each roadway section be assigned a priori to a factor group (Davis and Guan 1996; Davis 1997a). Rather, a structure similar to that shown in equations (1) and (2) is used, in which the sample data are used to assess the posterior probabilities the sample site belongs to each factor group. The MDT is then estimated as a weighted average, with the factor group probabilities providing the weights. The next section describes how this hierarchical modeling approach can be extended to develop a method for computing the predictive distribution of a site's total traffic volume, rather than its MDT, given a traffic count sample at the site.

Deriving the Predictive Distribution for Total Traffic Volume

Using Bayes Theorem to assess the information provided by a sample and then integrating out nuisance parameters, the two steps exemplified in equations (1) and (2) provide the basic framework for deriving the predictive distribution of traffic totals from more realistic assumptions. In the above example, we derived a predictive probability density function (lowercase z superscript {uppercase n} given lowercase z) but here we will focus on the corresponding predictive distribution function function (lowercase z superscript {uppercase n} given lowercase z) equals probability (uppercase z superscript {uppercase n} less than or equal to lowercase z superscript {uppercase n} given lowercase z) The distribution function is more useful from a practical standpoint since it leads immediately to a method for finding the quantiles of the predictive distribution by solving equations of the form function (lowercase z superscript {uppercase n} given lowercase z) equals lowercase alpha Since the expression for the cumulative distribution function turns out to have the form of a weighted average, an argument similar to that employed below could also be used to find the predictive probability density or the moments of the predictive distribution.

Aleatory Uncertainty

We will develop an explicit expression for function (lowercase z superscript {uppercase n} given lowercase z) in several steps. As in the example, the total traffic count zN is determined as the sum of the daily counts zt, so that the statistical properties of the daily traffic volumes will determine the form of the conditional distribution function(lowercase z superscript {uppercase n} given lowercase z) The first step is to characterize the statistical properties of the daily traffic volumes zt. In order to develop plausible statistical models for the day-to-day fluctuations in traffic volumes at a particular site, a detailed analysis of daily traffic counts from 50 ATRs in Minnesota was undertaken, as described in Davis (1997a) and Davis and Guan (1996). This work indicated that variations in daily traffic volumes could be described using a lognormal regression model of the form

lowercase y subscript {lowercase t} equals lowercase u plus (summation from i equals 1 to 12 (uppercase delta subscript {lowercase t, i} times lowercase m subscript {lowercase k, i})) plus (summation from lowercase j equals 1 to 7 (lowercase delta subscript {lowercase t, j} times lowercase w subscript {lowercase k, j}) plus lowercase e subscript {lowercase t})

where

lowercase y subscript lowercase t equals the natural logorithm (lowercase z subscript {lowercase t}), the natural logarithm of a daily count,

u = expected log traffic count on a typical day,

uppercase delta subscript {lowercase t, i} equals 1, if the count zt was made during month i, i=1,...,12, (0 otherwise)

mk,i = correction term for month i, characteristic of factor group k,

lowercase delta subscript {lowercase t, j} = 1, if the count zt was made on day-of-week j, j=1,...,7, (0 otherwise)

wk,j = correction term for day-of-week j, characteristic of factor group k,

et = random error.

If we let lowercase beta subscript {lowercase k} equals (lowercase m subscript {lowercase k, 1} to lowercase m subscript {lowercase k, 12}, lowercase w subscript {lowercase k, 1} to lowercase w subscript {lowercase k, 7}) denote a column vector containing the monthly and day-of-week adjustment terms for factor group k, and lowercase x subscript {lowercase t} equals [uppercase delta subscript {lowercase t, 1} to uppercase delta subscript {lowercase t, 12}, lowercase delta subscript {lowercase t, 1} to lowercase delta subscript {lowercase t, 7}] equation (4) can be written in a slightly simpler form:

lowercase y subscript {lowercase t} equals lowercase u plus (lowercase x subscript {lowercase t} times lowercase beta subscript lowercase k) plus lowercase e subscript {lowercase t}

In the above model, the mean value of the logarithm of the daily count varies according to month and day-of-week, and the magnitude of these variations depends on the factor group to which the site of interest belongs. Analysis of the regression residuals obtained after estimating the adjustment terms indicated that the error terms et were not independent but showed day-to-day dependencies, which could be described by a multiplicative autoregressive (AR) model of the form

lowercase e subscript {lowercase t} equals (lowercase phi subscript {1} times lowercase e subscript {lowercase t minus 1}) plus (lowercase phi subscript {7} times lowercase e subscript {lowercase t minus 7}) minus (lowercase phi subscript {1} times lowercase phi subscript {7} times lowercase e subscript {lowercase t minus 8}) plus lowercase a subscript {lowercase t}

Here the at are independent, identically distributed, normal, random variables with zero mean and common variance, and lowercase phi subscript {1} and lowercase phi subscript {7} are site-specific autoregressive coefficients.

The above model is parameterized by u, a mean-value parameter, lowercase beta, the monthly and day-of-week adjustment terms, the variance of the et terms, which we will denote by lowercase sigma superscript {2} and the autoregressive coefficients lowercase phi subscript {1} , lowercase phi subscript {7} In the next step, we will assume we know the values of these parameters but nothing else about the site. Properties of lognormal random variables (Shimizu and Crow 1988) can be used to show that the expected value of the total traffic volume is

lowercase mu subscript {uppercase n} equals expected value (lowercase z superscript {uppercase n}) equals summation from t equals 1 to uppercase n (exponential (lowercase u plus (lowercase x subscript {lowercase t} times lowercase beta subscript {lowercase k}) plus (lowercase sigma superscript {2} divided by 2)))

and the variance of the total traffic volume is

lowercase nu superscript {2} subscript {uppercase n} equals [lowercase e superscript {2 times lowercase mu plus lowercase sigma superscript {2}} times (lowercase e superscript {lowercase sigma superscript {2}} minus 1) times summation from lowercase t equals 1 to uppercase n [lowercase e superscript {2 times lowercase x subscript {lowercase t} times lowercase beta subscript {lowercase k}}]] plus (lowercase e superscript {2 times lowercase mu plus lowercase sigma superscript {2}}) times [summation from lowercase t equals 1 to uppercase n [lowercase e superscript {lowercase x subscript {lowercase t} times lowercase beta subscript {lowercase k}} times (summation from lowercase t equals 1 to uppercase n not including lowercase s [lowercase e superscript {lowercase x subscript {lowercase s} times lowercase beta subscript {lowercase k}} times (lowercase e superscript {lowercase rho subscript {lowercase t, s} times lowercase sigma superscript {2}} minus 1))]]]

Here lowercase rho subscript {lowercase t, s} denotes the correlation between et and es, which can be computed from lowercase phi subscript {1} , lowercase phi subscript {7} (see Brockwell and Davis 1991). Since zN is a sum of lognormal random variables, characterizing its probability distribution turns out to be very difficult even for the case N=2, and no result for general N is known (Johnson, Kotz, and Balakrishnan 1994). In most situations of practical interest, however, N will be equal to the number of days in one or more years, so an asymptotic approximation might prove useful. In the Appendix, a result due to Marlow (1967) is adapted to show that the cumulative distribution function of the random variable

(lowercase mu subscript {uppercase n} divided by lowercase nu subscript {uppercase n}) times natural logorithm (lowercase z superscript {uppercase n} divided by lowercase mu subscript {lowercase n})

converges to that of a standard normal random variable, implying that for large N, loge(zN) can be approximated by a normal random variable with mean equal to natural logorithm (lowercase mu subscript {uppercase n}) and variance equal to lowercase nu superscript {2} subscript {uppercase n} divided by lowercase mu superscript {2} subscript {uppercase n} This, in turn, supports using a lognormal approximation for zN,

probability [lowercase z superscript {uppercase n} less than or equal to lowercase z bar given (lowercase u, lowercase beta, lowercase phi subscript {1}, lowercase phi subscript {7}, lowercase sigma superscript {2})] equals probability [natural logorithm (lowercase z superscript {uppercase n}) less than or equal to natural logorithm (lowercase z bar) given (lowercase u, lowercase beta, lowercase phi subscript {1}, lowercase phi subscript {7}, lowercase sigma superscript {2})] is approximately equal to standard normal distribution [(natural logorithm (lowercase z bar) minus natural logorithm (lowercase mu subscript {uppercase n}) divided by (lowercase nu subscript {uppercase n} divided by lowercase mu subscript {uppercase n}]

where uppercase phi (.) denotes the standard normal distribution function.

Epistemic Uncertainty

The sample z contains two types of information concerning zN. On the one hand, it provides information concerning the model parameters. In principle, this information could be summarized by the posterior distribution function function (lowercase u, lowercase beta, lowercase phi subscript {1}, lowercase phi subscript {7}, lowercase sigma superscript {2} given lowercase z) On the other hand, the correlated noise in equation (6) implies that any given daily count zt is correlated with its neighbors, so knowing zt allows us to more accurately predict neighboring values. If the noise equation (6) is stationary, and if the sample counts are sufficiently separated in time from the counts comprising zN (such as might occur when trying to predict the total volume for 1999 using a sample taken in 1997) we can take the sample and the total count to be independent of each other. Then the sample provides information on the total only by providing information on the model parameters. The predictive distribution function (lowercase z superscript {uppercase n} given lowercase z) could then be found using a generalization of equation (2):

function (lowercase z superscript {uppercase n} given lowercase z) equals integral of function (lowercase z superscript {uppercase n} given lowercase u, lowercase beta, lowercase phi subscript {1}, lowercase phi subscript {7}, lowercase sigma superscript {2}) times derivative (lowercase z superscript {uppercase n} given lowercase u, lowercase beta, lowercase phi subscript {1}, lowercase phi subscript {7}, lowercase sigma superscript {2} given lowercase z) given

When the sample counts are correlated with counts comprising the total zN, the expression (11) will only be approximate, with the approximation deteriorating with increasing overlap between the sample counts and the counts entering into the total. In principle, smoothing algorithms could be used to include dependency on z in (7) and (8), so that a z-dependent asymptotic approximation could be developed, but the necessary computational labor appears to be substantial. For the situations commonly encountered in highway and safety engineering, the number of counts entering into the total will be large compared to the size of the sample (for example, one or more years for N, compared to a 48-hour or two one-week counts for n), so that most of the aggregating counts will be separated from the sample counts by at least one month. For these cases, we conjecture that equation (11) will provide a suitably accurate approximation. Some empirical evidence supporting this conjecture will be presented in the fourth section.

The final steps involve characterizing the distribution function (lowercase u, lowercase beta, lowercase phi subscript {1}, lowercase phi subscript {7}, lowercase sigma superscript {2}) given lowercase z and then finding a computationally feasible way to evaluate the (multidimensional) integral in (11). It turns out, however, that this problem is very similar to the problem of computing Bayes estimates of mean daily traffic described in Davis (1997a) and Davis and Guan (1996), and a similar solution can be employed here. The essence of this approach is to assess the prior uncertainty concerning the model parameters, and then use Bayes Theorem to account for information provided by the data sample.

As in Davis (1997a), we will assume that the highway agency has divided its road segments into a set of m factor groups and that estimates of the adjustments factors for each group, lowercase beta subscript {lowercase k}, k=1,...,m, are available. We will further assume that the agency maintains a total of M ATRs and that for each ATR estimates of the covariance parameters (lowercase phi subscript {1 lowercase p}, lowercase phi subscript {7 lowercase p}, lowercase sigma subscript {lowercase p} where lowercase p equals 1 to uppercase m are also available. Straightforward procedures for computing these estimates from ATR data, using commonly available software packages, are described in Davis (1997a). Prior to collecting any data for a site, we will assume that our uncertainty concerning that site's parameters is captured by the prior probability distribution

function (lowercase u, lowercase beta, lowercase sigma, lowercase phi subscript {1}, lowercase phi subscript {7}) equals [1 divided by uppercase m times the summation from lowercase p equals 1 to uppercase m [(uppercase i subscript {lowercase sigma subscript {lowercase p}, lowercase phi subscript {1 lowercase p},  lowercase phi subscript {7 lowercase p}} times (lowercase sigma, lowercase phi subscript {1}, lowercase phi subscript {7}))] times (1 divided by lowercase m times the summation from lowercase k equals 1 to lowercase m [(uppercase i subscript {lowercase beta {lowercase k}}) times (lowercase beta))]]

where Ib(x) denotes the indicator function

Ib(x) = 1, if x=b, (0 otherwise).

Basically, this prior assumes that before collecting data we are completely uncertain of the value of u in the sense that our prior probability is uniformly distributed on the real line. For the adjustment term lowercase beta, we are certain it takes on one of the values lowercase beta subscript {1} to lowercase beta subscript {lowercase m} characterizing our factor groups, but we are equally uncertain which of these is correct. Similarly, for (lowercase phi subscript {1}, lowercase phi subscript {7}, lowercase sigma) we are certain one of the sets of values estimated from our ATR sites is correct, but prior to collecting data we are equally uncertain which one. Completing the specification of this prior by generating estimates of the lowercase beta subscript {lowercase k} and (lowercase phi subscript {1 lowercase p}, lowercase phi subscript {7 lowercase p}, lowercase sigma superscript {2} subscript {lowercase p} from ATR data results in an empirical Bayes (EB) method, in the sense of Padgett and Robinson (1978). That is, empirical distributions from samples are used to form the priors.

Because the logarithms of the traffic counts are normal random variables, the likelihood function of the sample is easy to specify. Letting y denote the vector containing the logarithms of the sample counts and V denote the correlation matrix of y (which can be computed once the value of the AR parameters lowercase phi subscript {1} and lowercase phi subscript {7} is known), then if we knew the site-specific values for the parameters lowercase u, lowercase beta, lowercase sigma, lowercase phi subscript {1}, and lowercase phi subscript {7}, the likelihood of the sample could be computed using the appropriate multivariate normal density.

function (lowercase y given lowercase beta, lowercase u, lowercase sigma, lowercase phi subscript {1}, lowercase phi subscript {7}) equals [((2 times lowercase pi) superscript {minus lowercase n divided by 2}) times (lowercase sigma superscript {minus lowercase n}) times (absolute value (uppercase v) superscript {minus lowercase n divided by 2}) times [exponential (((negative one divided by two times lowercase sigma superscript {2}) times (lowercase y minus (uppercase x times lowercase beta subscript {lowercase k}) minus (lowercase u times 1 subscript lowercase n)) superscript {'} times (uppercase v superscript {-1}) times (lowercase y minus (uppercase x times lowercase beta subscript {lowercase k}) minus (lowercase u times 1 subscript {lowercase n})))]]

Here X is a matrix, of dimension N19, each row having elements equal to 0 or 1, according to the month and day-of-week of the corresponding sample count, while 1n is an n-dimensional column vector with each element equal to 1.0.

Predictive Distribution of Total Traffic

Applying Bayes Theorem to the prior and likelihood to obtain the posterior distribution for the parameters, substituting this into (11), and performing the indicated integrations produces, after some tedious algebra,

probability [lowercase z superscript {uppercase n} less than or equal to lowercase z bar given lowercase z] equals probability [lowercase y superscript {uppercase n} less than or equal to lowercase y tilda given lowercase z] is approximately equal to [summation from lowercase p equals 1 to uppercase m of (lowercase sigma superscript {negative (lowercase n minus 1)}) times (absolute value (uppercase v subscript {lowercase p}) superscript negative one half) times (((1 superscript {'} subscript {lowercase n}) times (uppercase v superscript {minus 1}) times (1 subscript {lowercase n})) superscript {minus one half})] times [summation from lowercase k equals 1 to lowercase m of standard normal distribution function [((lowercase y tilda minus lowercase y hat subscript {lowercase p, k}) divided by (uppercase s subscript {lowercase p, k})) times (exponential (uppercase s superscript {2} subscript {lowercase p, k}) divided by (2 times lowercase sigma superscript {2} subscript {lowercase p}))] divided by [summation from lowercase p equals 1 to uppercase m of (lowercase sigma superscript {negative (lowercase n minus 1)}) times (absolute value (uppercase v subscript {lowercase p}) superscript negative one half) times (((1 superscript {'} subscript {lowercase n}) times (uppercase v superscript {minus 1}) times (1 subscript {lowercase n})) superscript {minus one half})] times [summation from lowercase k equals 1 to lowercase m (exponential (uppercase s superscript {2} subscript {lowercase p, k}) divided by (2 times lowercase sigma superscript {2} subscript {lowercase p}))]

where yN=loge(zN), Vp denotes the sample correlation matrix computed using lowercase phi subscript {1 lowercase p}, lowercase phi subscript {7 lowercase p} and

(equation 1 of 4) uppercase s superscript {2} subscript {lowercase p k} equals [(lowercase y minus (uppercase x times lowercase beta subscript {lowercase k}) minus (lowercase y bar subscript {lowercase p k} times 1 subscript {lowercase n})) superscript {prime}] times [uppercase v superscript {minus 1} subscript {lowercase p}] times [lowercase y minus (uppercase x times lowercase beta subscript {lowercase k}) minus (lowercase y bar subscript {lowercase p k} times 1 subscript {lowercase n})]; (equation 2 of 4) lowercase y bar subscript {lowercase p k} equals [(1 superscript {'} subscript {lowercase n} times uppercase v superscript {minus 1} subscript {lowercase p}) times (lowercase y minus (uppercase x times lowercase beta subscript {lowercase k})] divided by [((1 superscript {'} subscript {lowercase n} times uppercase v superscript {minus 1) subscript {lowercase p} times 1 subscript {lowercase n})]; (equation 3 of 4) lowercase y tilda subscript {lowercase p k} equals lowercase y bar subscript {lowercase p k} plus (lowercase sigma superscript {2} subscript {lowercase p} divided by 2) plus natural logorithm (summation from lowercase t equals 1 to uppercase n of (lowercase e superscript {lowercase x subscript {lowercase i} times lowercase beta subscript {lowercase k}); (equation 4 of 4) lowercase s subscript {lowercase p k} equals (lowercase sigma superscript {2} subscript {lowercase p} divided by (1 superscript {'} subscript {lowercase n} times uppercase v superscript {minus 1) subscript {lowercase p} times 1 subscript {lowercase n}) plus (lowercase nu superscript {2} subscript {uppercase n, lowercase k p} divided by (lowercase mu subscript {uppcase n, lowercase k p}) superscript {2}] superscript minus one half

where lowercase nu superscript {2} subscript {uppercase n, lowercase k p} and lowercase mu subscript {uppercase n, lowercase k p} are as defined in (7) and (8) but evaluated using lowercase beta subscript {lowercase k} and (uppercase phi subscript {1 lowercase p} uppercase phi subscript {7 lowercase p} lowercase sigma subscript {lowercase p}) The distribution given in (14) is a finite mixture of normal distributions where the weights given to the mixture components are the posterior probabilities that the sampled site has adjustment factors and covariance parameters characteristic of each the m factor groups and each of the M ATR sites. Although the expressions in (14) and (15) appear rather forbidding, the implied computations are readily carried out on a personal computer.

A Calibration Test

As noted above, the distribution (14) approximates the predictive distribution of a total traffic count, the approximation being appropriate when predicting the total of a large number of days (for example, a year or more) from a small sample (for example, two weeks or less). In an earlier study, Davis (1997a) used traffic counts from the year 1992 from 50 ATRs in outstate Minnesota to estimate monthly and day-of-week adjustment terms for the Minnesota Department of Transportation's (Mn/DOT) 3 outstate factor groups, as well as covariance parameters for each of the 50 ATRs. These estimates were then used to construct the discrete prior distributions for lowercase beta subscript {lowercase k} and (lowercase phi subscript {1} lowercase phi subscript {7} lowercase sigma subscript {lowercase p}), giving m=3 and M=50. In addition, daily counts from the year 1991 were available for 48 ATRs, and for each of these ATRs a sample consisting of a one-week count from the month of March and a one-week count from the month of July was drawn. The 1992 data were used for estimation, and 1991 data were used for validation because more ATRs had good data in 1992. A MATLAB (Mathworks 1992) program for evaluating (14) was written, and then for each of the 48 ATRs, the 5th and 95th percentile points of the predictive distribution of the logarithm of the 1991 total traffic volume were computed by embedding this routine inside MATLAB's root-finding algorithm. Finally, the logarithm of the total 1991 actual traffic volume was also computed for each ATR. The results of these computations are displayed in tables 1 through 3.(table 1, table 2, table 3)

Note that the 5th and 95th percentile points describe the bounds of a 90% credible interval, and, clearly, if a large number of actual traffic counts fell outside the bounds of our intervals, we would have evidence for inaccurate prediction. On the other hand we would still expect a few actual counts to fall outside our bounds. If the intervals caught all actual volumes, we would be inclined to believe that the computed credible intervals were too large. If the approximation is acceptably accurate, we would expect the actual count to fall outside the bounds 10% of the time, and a test of the adequacy of the estimated credible bounds can be made by treating the number of missed totals as the outcome of a binomial random variable with 48 trials and a hypothesized miss probability of p=0.1. Inspection of the tables shows that for 8 of the ATRs (2, 8, 12, 204, 208, 217, 218, and 226) the actual count fell outside the estimated bounds, for a total of 8 binomial "successes." Since the probability of obtaining 8 or more successes by chance is 0.102, this result is not inconsistent with the hypothesis that equation (14) provides a reasonable approximation of the predictive distribution.

Conclusions

Predicted or estimated traffic totals are required in both highway pavement and safety engineering and are used to produce statewide and nationwide estimates of total distance traveled. Although recommended practices exist for estimating traffic totals as part of a traffic monitoring program, it is less clear how we should characterize the uncertainty associated with these estimates. This paper describes an initial solution to this problem, in which empirical Bayes methods are used to compute the quantiles of a traffic total's predictive distribution, given a sample of daily traffic volumes. Probable ranges and their associated probability values are readily found, and, if desired, a point prediction of the total traffic can be obtained as the median, or 50th percentile, of the predictive distribution. The method of derivation can also be used to find the predictive density and the moments of the predictive distribution. No data are required beyond that routinely collected by statewide traffic monitoring programs, and the estimates of the factor group adjustment parameters can be computed using standard linear regression methods. All other computations have been successfully implemented as MATLAB macros.

In conclusion, almost all engineering decisions must be made in the face of uncertainty, and the art of successful engineering requires cost-effective hedging against this uncertainty. It was argued earlier that standard methods for predicting total traffic ignore potentially important sources of error, and, hence, understate the resulting uncertainty characterizing estimates and predictions. Many of the statistical procedures used in highway engineering date to the middle part of the 20th century and are based on simplified statistical models adapted to the computational constraints of those times. Statistical science has advanced considerably since then, and these advances can support and encourage the use of more realistic models in highway engineering. This paper proposes a modest step in this direction by providing a computationally practical method which accounts for uncertainty in traffic volume predictions. Of course, the importance of hedging against uncertainty depends on the consequences of error, and, fortunately, so far the consequences attached to using mistakenly precise traffic forecasts have not been too severe. Whether or not this state of affairs continues is of course another uncertain prediction about the future.

Acknowledgments

The authors would like to thank Mark Flinner of Mn/DOT for providing the traffic-count data used in this study. This research was sponsored by Mn/DOT. However, all facts, conclusions and opinions expressed here are solely the responsibility of the authors and do not necessarily reflect the views of Mn/DOT.

References

AASHTO. 1992. Guidelines for Traffic Data Programs. Washington, DC.

____. 1993. Guide for Design of Pavement Structures. Washington, DC.

Box, G. and G. Tiao. 1973. Bayesian Inference in Statistical Analysis. New York: Wiley and Sons.

Brockwell, P. and R. Davis. 1991. Time Series: Theory and Methods. New York: Springer-Verlag.

Cambridge Systematics. 1994. Use of Data from Continuous Monitoring Sites. Report to Federal Highway Administration, U.S. Department of Transportation, Washington, DC.

Davis, G. 1997a. Estimation Theory Approach to Monitoring and Updating Average Daily Traffic. Report 97-05, Minnesota Department of Transportation, St. Paul, MN.

Davis, G. 1997b. Accuracy of Estimates of Mean Daily Traffic: A Review. Transportation Research Record 1593:12-6.

Davis, G. and Y. Guan. 1996. Bayesian Assignment of Coverage Count Locations to Factor Groups and Estimation of Mean Daily Traffic. Transportation Research Record 1542:30-7.

Draper, D. 1995. Assessment and Propagation of Model Uncertainty. Journal of the Royal Statistical Society B 57:45-97.

Foster, K. and P. Huber. 1999. Judging Science: Scientific Knowledge and the Federal Courts. Cambridge: MIT Press.

Gallant, R. and H. White. 1988. A Unified Theory of Estimation and Inference for Nonlinear Dynamic Models. New York: Basil-Blackwell.

Hacking, I. 1975. The Emergence of Probability. Cambridge: Cambridge University Press.

Hallenbeck, M. and S-G. Kim. 1994. Truck Loads and Flows, Task A Final Technical Report. Washington State Transportation Research Center (TRAC), Seattle, WA.

Howson, C. and P. Urbach. 1993. Scientific Reasoning: The Bayesian Approach. Chicago: Open Court.

Johnson, N., S. Kotz, and N. Balakrishnan. 1994. Continuous Univariate Distributions, Volume 1. Second edition. New York: Wiley and Sons.

Marlow, N. 1967. A Normal Limit Theorem for Power Sums of Independent Random Variables. Bell Systems Technical Journal 46:2081-9.

Mathworks. 1992. MATLAB Reference Guide, Version 4. Mathworks, Inc.

Padgett, W. and J. Robinson. 1978. Empirical Bayes Estimators of Reliability for Lognormal Failure Models. IEEE Transactions on Reliability R-27:223-336.

Parker, M. 1991. Highway Safety Engineering Studies: Procedural Guide. Report No. FHWA-HI-88-039. Federal Highway Administration, U.S. Department of Transportation, Washington, DC.

Sarewitz, D. and R. Pielke. 2000. Prediction in Science and Policy. Prediction: Science, Decision Making, and the Future of Nature. Washington, DC: Island Press.

Sharma, S., B. Gulati, and S. Rizak. 1996. Statewide Traffic Volume Studies and Precision of AADT Estimates. ASCE Journal of Transportation Engineering 122:430-9.

Shimizu, K. and E. Crow. 1988. History, Genesis, and Properties. Lognormal Distributions: Theory and Applications. New York: Marcel Dekker.

Stewart, T. 2000. Uncertainty, Judgment, and Error in Prediction. Prediction: Science, Decision Making, and the Future of Nature. Washington, DC: Island Press.

U.S. Department of Transportation (USDOT), Federal Highway Administration (FHWA). 1995. Traffic Monitoring Guide, Third Edition. Washington, DC.

Appendix

Weak Convergence of Sums For a Class of Correlated Lognormal Random Variables

As above, let zt, t=1,...,N denote a sequence of lognormal random variables with summation from lowercase t equals 1 to uppercase n (lowercase z subscript {lowercase t} denoting their sum. uppercase phi (lowercase x) denotes the standard normal distribution function. As before, we will assume that the zt follow the model

natural logorithm (lowercase z subscript {lowercase t}) equals lowercase u plus (lowercase x subscript {lowercase t} times lowercase beta) plus lowercase e subscript {lowercase t}

and the error terms {et} follow a stationary p-order autoregressive (AR(p)) process. Marlow (1967) showed that if there exists sequences of positive real number lowercase mu subscript {lowercase n} and lowercase nu subscript {lowercase n} such that

limit as uppercase n approaches infinity (lowercase nu subscript {uppercase n} divided by lowercase mu subscript {uppercase n} equals 0

limit as uppercase n approaches infinity (probability ((lowercase mu subscript {uppercase n}) divided by lowercase mu subscript {uppercase n}) less than or equal to lowercase x) equals uppercase phi (lowercase x),

then

limit as uppercase n approaches infinity probability [(lowercase mu subscript {uppercase n} divided by lowercase nu subscript {uppercase n}) times (natural logorithm (lowercase z subscript {uppercase n} divided by lowercase mu subscript {uppercase n})) less than or equal to lowercase x] equals uppercase phi (lowercase x)

To verify conditions (a) and (b), we will impose the restriction that the monthly and day-of-week factors for any given day are bounded from above and also bounded away from zero. That is, there exist constants uppercase delta subscript {1} and uppercase delta subscript {2}, such that

zero less than uppercase delta subscript {1} less than or equal to lowercase e superscript (lowercase x subscript {lowercase {t} times lowercase beta subscript {lowercase k}} less than or equal to uppercase delta subscript {2} less than infinity

for all t and k. It is then possible to show that

lowercase nu subscript {uppercase n} equals [(lowercase e superscript {(2 times lowercase mu) plus lowercase sigma superscript {2}}) times (lowercase e superscript {lowercase sigma superscript {2}} minus 1) summation from lowercase t equals 1 to uppercase n (lowercase e superscript {2 times lowercase x subscript {lowercase t} times lowercase beta subscript {lowercase k}}) plus (lowercase e superscript {(2 times lowercase mu) plus lowercase sigma superscript {2}}) times (summation from t equals 1 to uppercase n (lowercase e superscript {lowercase x subscript {lowercase t} times lowercase beta subscript {lowercase k} (summation lowercase s not equal to 1 (lowercase e superscript {x subscript {lowercase s} time lowercase beta subscript {lowercase k}}) (lowercase e superscript {lowercase rho subscript {lowercase t, s} times lowercase sigma {2}} minus 1)))] superscript {one half} less than or equal to [(uppercase n time lowercase e superscript {(2 times lowercase mu) plus (lowercase sigma superscript {2}) plus (2 times lowercase delta subscript {2})) times (1 plus lowercase gamma))] superscript {one half} equals order (uppercase n superscript {one half})

where

lowercase gamma equals summation from lowercase k equals 1 to infinity (lowercase e superscript {lowercase rho subscript {lowercase k} times lowercase sigma {2}} minus 1) less than infinity

whenever the lowercase rho subscript {lowercase k} are the autocorrelations for a stationary AR(p) process. Similarly,

function uppercase n (lowercase e superscript {lowercase u plus (lowercase sigma superscript {2} divided by 2) plus lowercase delta subscript {1}}) less than or equal to lowercase mu subscript {uppercase n} equals (lowercase e superscript {lowercase u plus (lowercase sigma superscript {2} divided by 2)}) summation from lowercase t equals 1 to uppercase n (lowercase e superscript {lowercase x subscript {lowercase t} times lowercase beta}) less than or equal to function uppercase n (lowercase e superscript {lowercase u plus (lowercase sigma superscript {2} divided by 2) plus lowercase delta subscript {2}})

so that

lowercase nu subscript {uppercase n} divided by lowercase mu subscript {uppercase n} equals order (uppercase n superscript {minus one half})

and condition (a) is satisfied.

If the daily counts zt were independent, we could use either the Lyaponuv or Lindbergh central limit theorems to verify condition (b), as was done by Marlow (1967). A more general central limit theorem, allowing for dependence of the sort generated by the AR(p) model for the errors et , is stated in Theorem 5.3 of Gallant and White (1988, 76). In particular, if we can show that

(equation 1 of 4) fourth central moment [lowercase z subscript {lowercase t} minus expected value (lowercase z subscript {lowercase t})] less than or equal to lowercase delta less than infinity, for all lowercase t; (equation 2 of 4) supremum subscript {lowercase t} (second central moment [lowercase z subscript {lowercase t} minus expected value (lowercase z subscript {lowercase t} given lowercase e subscript {lowercase t plus lowercase m},...,lowercase e subscript {lowercase t},...,lowercase e subscript {lowercase t minus lowercase m})]) equals order (1 divided by lowercase m); (equation 3 of 4) the noise processes lowercase e subscript {lowercase t} is (lowercase alpha minus mixing) of size -4; (equation 4 of 4) lowercase nu superscript {minus 2} subscript {uppercase n} equals order (1 divided by uppercase n)

then condition (b) will be satisfied, and we are done.

1) Let lowercase omega equals exponential (lowercase sigma superscript {2}), and since zt is lognormal, its fourth central moment is known, so that

fourth central moment [lowercase z subscript {lowercase t} minus expected value (lowercase z subscript {lowercase t})] equals [lowercase omega superscript {2} times ((lowercase omega minus 1) superscript {2}) times (lowercase omega superscript {2} plus (2 times lowercase omega superscript {3}) plus (3 times lowercase omega superscript {2} minus 3)) times (lowercase e superscript {4 times (lowercase u plus lowercase x subscript {lowercase t} time lowercase beta)})] superscript {one fourth} less than or equal to [lowercase omega superscript {2} times ((lowercase omega minus 1) superscript {2}) times (lowercase omega superscript {2} plus (2 times lowercase omega superscript {3}) plus (3 times lowercase omega superscript {2} minus 3)) times (lowercase e superscript {4 times lowercase u}) times (lowercase e superscript {lowercase delta subscript {2}})] superscript {one fourth} less than infinity

2) This condition is satisfied trivially since

expected value (lowercase z subscript {lowercase t} given lowercase e subscript {lowercase t plus lowercase m},...,lowercase e subscript {lowercase t},...,lowercase e subscript {lowercase t minus lowercase m}) equals expected value (lowercase z subscript {lowercase t} given lowercase e subscript {lowercase t}) equals lowercase z subscript {lowercase t}

implies

supremum subscript {lowercase t} (second central moment [lowercase z subscript {lowercase t} minus expected value (lowercase z subscript {lowercase t} given lowercase e subscript {lowercase t plus lowercase m},...,lowercase e subscript {lowercase t},...,lowercase e subscript {lowercase t minus lowercase m})]) equals 0

3) This condition is also satisfied trivially since the fact that the noise process {et} is a stationary AR(p) process implies that it is lowercase alpha minus mixing of all orders.

4) This follows from the fact, demonstrated above, that lowercase nu superscript {2} subscript {uppercase n} equals order (uppercase n).

Address for Correspondence

Gary A. Davis, Department of Civil Engineering, University of Minnesota, 122 CivE, 500 Pillsbury Drive SE, Minneapolis, MN 55455. Email: drtrips@tc.umn.edu.