Modeling Trip Duration for Mobile Source Emissions Forecasting
Modeling Trip Duration for Mobile Source Emissions Forecasting
Metropolitan area trip duration distributions are an important input for estimating area-wide running loss emissions, operating mode fractions, and vehicle-miles traveled accumulated on local roads in the region. This paper discusses the formulation and implementation of a methodology for modeling trip durations. The approach develops a log-linear regression model of trip duration as a function of trip purpose, time of day of the trip start, and other land-use and socio-demographic characteristics of the zone of trip start, using vehicle trip data from household travel surveys and supplementary zonal demographic and land-use data. A distinguishing characteristic of the methodology is the straightforward manner in which model parameters estimated from vehicle trip data can be applied to obtain zonal-level trip duration distributions. The modeling framework is applied to develop trip duration distributions for the Dallas-Fort Worth area of Texas.
BACKGROUND AND SIGNIFICANCE
The Intermodal Surface Transportation Efficiency Act of 1991 and the Clean Air Act Amendments of 1990 require localities not meeting the National Ambient Air Quality Standards set by the Environmental Protection Agency (EPA) to demonstrate area-wide conformity with mobile source emissions budgets established in their respective state implementation plans. For such conformity analyses, the Mobile Source Emissions Factor Model (MOBILE) is generally used.1
The emissions factor models use traffic-related data as inputs. Regional vehicle trip duration distribution data are one type and are important for several reasons. First, the trip duration distribution provides information for developing trip duration activity parameters used by the MOBILE emissions factor model to estimate running loss emissions. Running loss emissions are evaporative emissions that escape from a vehicle while the engine is operating (from spots where the vehicle's evaporative/purge system has become inoperative). Due to greater heating of the engine fuel and evaporative system on longer trips, running loss emissions continue to increase as a function of trip duration until the emissions reach a plateau at a trip duration of about 50 to 60 minutes (Glover and Brzezinski 2001). Second, the trip duration distribution enables the estimation of operating mode fractions, which are needed by MOBILE5 to calculate emissions rates. Third, the trip duration distribution can be used to predict the vehicle-miles traveled (VMT) on local roads in the region.
Trip duration is likely to depend on various factors such as the trip purpose, the time of day the trip began, and other land-use and socio-demographic characteristics of the zone of the trip start. In this paper, we formulate and implement a methodology for modeling trip durations using vehicle trip data from household travel surveys and supplementary zonal demographic and land-use data. The implementation is demonstrated in the context of mobile source emissions analysis for the Dallas-Fort Worth area in Texas.
The next section reviews earlier studies relevant to our subject and the motivation for our research. We then discuss the development of the model estimation and application framework. The following section focuses on data sources and data assembly procedures, after which we present the empirical results. The next section discusses issues related to integrating the trip duration model with travel demand models, and we conclude the paper with an evaluation of the model.
LITERATURE REVIEW AND MOTIVATION FOR THE STUDY
Running Loss Emissions
The methodology to estimate running loss emissions differs from MOBILE5 to MOBILE6. In MOBILE5, running loss emissions are modeled as a direct function of the input temperature, fuel volatility, average speed, and trip duration. The procedure for calculating the running loss emissions entails partitioning the vehicle trip duration into 6 time duration bins (less than 10 minutes, 11 to 20 minutes, 21 to 30 minutes, 31 to 40 minutes, 41 to 50 minutes, and 51 minutes and longer) and obtaining the proportion of VMT accumulated by trips that fall into each time duration bin (these proportions are referred to as the trip duration activity parameters).
Within MOBILE5, the running loss emissions value of an average vehicle trip is calculated as the sum of the product of the emissions factors associated with each time duration bin (embedded within MOBILE5) and the corresponding trip duration activity parameter. The product of these average running loss emissions and the number of trips per day represent the running loss emissions level. The user can then accept default daily running loss emissions values available within MOBILE5 (developed using default trip-time distributions representing national average conditions) or develop region-specific estimates by using a local set of trip duration activity parameters. The MOBILE5 manual suggests using area-specific trip duration activity parameters to more accurately estimate running loss emissions.
MOBILE6 advances the state-of-the-art and practice by providing activity parameters for each of 14 time periods in a day and by distinguishing between weekdays and weekends. The default MOBILE6 hourly activity estimates are based on an EPA survey of 168 vehicles and are invariant across geographic regions and trip purpose categories. Thus, as in MOBILE5, EPA recommends the use of locally estimated trip duration activity parameters whenever possible. However, to our knowledge, there has been no earlier attempt in the literature to develop such locally estimated trip duration parameters.
In summary, using trip duration activity parameters developed from local data for estimating running loss emissions constitutes an important improvement over using the default values embedded in the MOBILE emissions factor model. In this paper, we present a methodology to develop zone-specific trip duration activity parameters that vary by time-of-day and trip purpose using a trip duration model estimated from local data.
Operating Mode Fractions
Operating mode fractions are an important input to MOBILE5 for estimating mobile source emissions. There are two dimensions associated with operating mode fractions; one is the start mode of vehicle trips (cold versus hot) and the second is the running mode of vehicle trips (transient versus stabilized). In an earlier paper, we focused on the start mode of trips (Nair et al. 2002). Trip duration modeling, the focus of this paper, affects the latter dimension of the operating mode, that is, the running mode of trips. To the extent that running mode fractions can be more accurately estimated using a trip duration model calculated from local data, such a model can contribute toward improved mobile source emissions forecasting.
EPA defines the transient mode of operation as all vehicle operations before 505 seconds after the start of a trip and the stabilized mode as all operations after 505 seconds of a trip. EPA recommends the following default values for running mode fractions: transient (47.9%) and stabilized (52.1%). Most metropolitan planning organizations (MPOs) accept these default running mode fractions. However, these default values were developed over 20 years ago and earlier research (USEPA 1993) suggests that these values may no longer adequately represent overall vehicle emissions control performance. In addition, the default fractions do not vary by trip purpose, time of day, or regional land-use and socio-demographic characteristics.
Few studies have attempted to develop locally estimated running mode fractions of trips. Brodtmen and Fuce (1984) used field data obtained by direct on-road measurement of engine conditions to develop running mode fractions in New Jersey. Ellis et al. (1978) analyzed origin-destination data from travel surveys in Alabama to develop aggregate measures of running mode fractions. Frank et al. (2000) developed transient and stabilized mode fractions based on vehicle trip times, using the Puget Sound Panel Survey. Chatterjee et al. (1996) and Venigalla et al. (1999) used a network-based approach for modeling running modes, in which they traced the elapsed time of vehicles from trip origins during the assignment of trips to the highway network. Allen and Davies (1993) have similarly used the ASSIGN module of MINUTP, a commercially available planning model, to determine trips operating in the transient mode for the southern New Jersey area.
A limitation of the above studies is that they compute a single set of running mode fractions for an entire state (or for aggregate regions within a state) and for various times of day and trip purposes. In this paper, we estimate a trip duration model using local data from a metropolitan region and present a methodology to use this estimated model to develop running mode fractions that vary by zone within the region, time of day, and trip purpose. In addition, our methodology allows for the estimation of running mode fractions for travel on local roads.
VMT on Local Roads
Local roads are usually not included in the travel demand model networks used by most MPOs; thus, the travel speeds and volumes required to calculate the VMT on local links are unavailable. Many MPOs simply calculate the VMT as a percentage (typically about 10%) of the VMT on all other roads and use it in developing their emissions inventories. This method is rather ad hoc in nature and can result in VMT estimates quite different from the actual values. A few MPOs calculate the VMT on local roads as a product of the total intrazonal trips for each zone (obtained from the origin-destination trip-interchange matrices at the end of trip distribution) and an average intrazonal trip length parameter. The average intrazonal trip length parameter is typically calculated as a function of the total area of the zone. While this method is a substantial improvement over using a percentage of VMT on nonlocal roads, it is still limited by the restrictive nature of variation of the intrazonal trip length parameter. In particular, in this method, the intrazonal trip length (and, therefore, local VMT) does not vary by trip purpose, time of day, and zonal spatial attributes (other than zonal area).
Our study develops the intrazonal trip length as a function of time of day, purpose, and zonal attributes. We accomplish this by estimating a trip duration model and then multiplying the predicted intrazonal trip duration by an estimate of average speed on local links (it is more straightforward to develop a direct model of intrazonal trip length, but most household surveys collect data only on trip duration and not trip length).
Our modeling approach uses vehicle trip data from household travel surveys and zonal demographic and land-use data from supplementary sources. The approach develops the distribution of the duration of trips using a log-linear regression model. The use of a log-linear form for trip duration guarantees the non-negativity of trip time in application of the model.
The application step of the model predicts the trip duration distributions for each traffic analysis zone in a metropolitan region and for each combination of time of day and trip purpose. An important characteristic of the proposed method is the ease with which the estimated models using vehicle trip data can be immediately applied to obtain zonal-level trip-time distributions.
Let q be the index for vehicle trip, t be the index for time of day, i be the index for activity purpose (i may be defined as a function of the activity purposes at both ends of the trip q), and z be the index for zone. Define ωqti to be a dummy variable taking the value 1 if vehicle trip q occurs in time period t with trip purpose i, and 0 otherwise; define δqz as another dummy variable taking the value 1 if vehicle trip q is produced from zone z, and 0 otherwise. Define I q to be a variable that takes the value 1 if vehicle trip q is intrazonal, and 0 otherwise. Let x z be a vector of zonal attributes.
We assume the trip duration to be log-normally distributed in the population of trips, and develop a linear regression model for the duration as a function of trip purpose, time of day, and land-use and socio-demographic characteristics of the zone of trip production. Let dq be the duration of vehicle trip q. Then, we write the log-linear regression equation for the trip duration as
η is the generic constant to be estimated;
the αti's (t = 1, 2,...,T ; i = 1,2,...,I) are scalars capturing the effects of time of day and activity purpose on trip duration (these scalars are to be estimated);
λ is a vector of parameters representing the effects of the characteristics of the zone of trip production (the vector λ is also to be estimated).
χ, ζti, and ρ are similar to η, αti, and λ, respectively, but are introduced as specific to intrazonal trips (note that I q takes the value 1 if vehicle trip q is an intrazonal trip, and 0 otherwise). εq is a normally distributed random error term introduced to complete the statistical specification.
In equation (1) above, we have not allowed interactions between zonal attributes x z and time of day/trip purpose combinations ωqti; however, this is purely for notational convenience and for ease in presentation of the model application step. Interactions between x z and ωqti can be included within the model structure without any additional conceptual or estimation complexity. Similarly, the notation structure implies full interactions of time and trip purpose (as defined by the dummy variable ωqti), though more restrictive structures such as single dimensional effects without interaction can be imposed by appropriately constraining the αti and ζti scalars across the different time/trip purpose combinations.
The reader will note that the inclusion of the intrazonal dummy variable, and interactions of this variable with exogenous variables, allows us to accommodate separate trip duration distributions for intrazonal vehicle trips and interzonal vehicle trips. The model from equation (1) can be estimated using any commercially available software with a linear regression module.
Trip Duration Activity Parameters
for Running Loss Emissions
The trip duration distribution for any zone in the study area by time period and trip purpose can be predicted in a straightforward manner after estimation of equation (1). The (log) trip duration distribution of interzonal vehicle trips in time t for trip purpose i produced from zone z may be written as
The superscript a in the above equation is used to denote interzonal trips. The mean and variance σ2 of this distribution can be estimated from the parameter estimates obtained in the estimation stage. The corresponding distribution of intrazonal vehicle trips in time t for trip purpose i in zone z may be written as
where the superscript l is used to denote intrazonal trips.
The objective in our effort is to obtain the fraction of VMT accrued by trips in each of six trip duration bins (as needed by MOBILE; see the Running Loss Emissions section earlier in this paper) for each zone and for each trip purpose and time-of-day combination. Let k be an index for time-bin (k = 1,2,...,6), and let k be bounded by the continuous trip duration value of m k1 to the left and by m k to the right. Let V k be the average speed of trips in time-bin k and let be the fraction of trips originating in zone z that are intrazonal.2 Then, the fraction of VMT accrued by interzonal trips in time-bin k, during time of day t, for trip purpose i, produced from zone z () can be obtained as (the derivation of the expression is available from the authors)
In the above equation structure, represents the proportion of interzonal trips in time period t, for trip purpose i, produced from zone z that fall in trip duration bin k. represents the mean trip duration of interzonal trips in time period t, for trip purpose i, produced from zone z that fall in trip duration bin k. The product of and with Vk represents the VMT accrued by interzonal trips in time period t, for trip purpose i, produced from zone z that fall in trip duration bin k. represents the total VMT accrued by interzonal trips in time period t, for trip purpose i, produced from zone z, and is obtained by summing the VMT across all trip duration bins.
The fraction of VMT accrued by intrazonal trips in time-bin k, in time t, for trip purpose i, produced from zone z () can be obtained by substituting instead of in equations (4) through (7).
Finally, the fraction of VMT accrued by all trips in each time-bin k, for trip purpose i, from zone z, during time t may be written as
Running Mode Fractions for MOBILE5
This section presents the method to obtain the proportion of transient and stabilized trips required as an input to MOBILE5. We begin by discussing the approach for interzonal trips; the approach is identical for intrazonal trips, with appropriate replacements to reflect the mean and variance of intrazonal trips.
Let the assumed speed of vehicles be v mph. Let the mean of the distribution of trips of duration less than 8.42 minutes (505 seconds) occurring in time period t, with trip purpose i, produced from zone z be and let the corresponding mean of the distribution of trips of duration greater than 8.42 minutes be ( and represent the means of the right- and left-truncated normal distributions of trip duration, respectively).
We obtain the analytical expression for (see Greene 1997) as
The transient mode VMT accumulated by trips of duration less than (or equal to) 8.42 minutes, is given by
[Number of trips of duration ≤ 8.42 min]. Trips of duration greater than 8.42 minutes are in the transient mode for the first 8.42 minutes of their operation. The transient mode VMT accumulated by such trips is given by [Number of trips of duration > 8.42 min]. Therefore, the total transient mode VMT in time period t, of purpose i, due to trips produced from zone z is given by the following expression
[Total number of trips]
The mean duration of trips greater than 8.42 minutes, , is given by
The VMT in the stabilized mode in time t, for trip purpose i, originating in zone z can be obtained as
[Number of trips of duration > 8.42 min].
Therefore, the expression for the VMT accumulated in the stabilized mode is
Finally, the fraction of VMT in transient versus stabilized modes in zone z, during time period t, and trip purpose i can be obtained from equations (10) through (12).
The reader will also note that the running mode fractions for intrazonal trips may be readily obtained after replacing with , with , and with , respectively, in equations (10) through (12).
VMT on Local Roads
As noted in the model estimation section, the intrazonal nature of a trip is captured through the interaction effects of I q with exogenous determinants of trip duration. The logarithm of the trip duration of intrazonal trips in time t, with trip purpose i, in zone z is normally distributed, as shown in equation (3). It follows from this that the trip duration distribution of intrazonal vehicle trips in time t, with trip purpose i, in zone z is log-normally distributed with a mean and variance given by the following expressions (see Johnson and Kotz 1970):
The mean trip length of intrazonal trips is the product of and the average speed on local roads (which many MPOs assume to be around 20 mph; if data on the variation of average speeds with zone, time period, and/or trip purpose are available, the corresponding average speed may be used). The total VMT on local roads can next be estimated as the product of the mean intrazonal trip length and the total intrazonal vehicle trips (obtained from the trip distribution step in the travel demand modeling process). Our methodology accommodates the variation in VMT on local roads with time of day, trip purpose, and zonal socio-demographic and land-use characteristics through the variation of the average intrazonal trip duration with these characteristics.
To summarize this section, we presented the formulation for a model of trip duration as a function of trip purpose, time of day, and zonal/trip attributes. We proposed methods that can be implemented after estimation of the trip duration model to predict 1) running loss emissions, 2) running mode fractions, and 3) local road VMT. The outputs from the application of the model are the above three mobile source-related parameters for each time of day and trip purpose combination and for each zone in a planning area. The model framework can be integrated within a broader travel demand-air quality forecasting procedure in a straightforward fashion, as discussed later in this paper.
The data used in the empirical analysis were drawn from two sources: the 1996 Activity Survey conducted in the Dallas-Fort Worth (D-FW) area and the zonal land-use and demographics characteristics file for the D-FW area. These data sources were obtained from the North Central Texas Council of Governments (NCTCOG).
Several data assembly steps were involved in developing the sample. First, we converted the raw composite (travel and nontravel) activity file into a corresponding person trip file. Second, we identified person trips that were pursued using a motorized vehicle owned by the household. Third, we translated the person trip file into a corresponding vehicle trip file, which provided the sequence of trips made by each vehicle in the household. In this process, we extracted and retained information on the time of day of each vehicle trip start, traffic analysis process (TAP) zone of trip production, and the purpose of the activity being pursued at the attraction-end of the trip. Fourth, we aggregated the traffic survey zone (TSZ) level land-use and demographic characteristics to the TAP level and appended this information to each vehicle trip start based on the TAP in which the trip start occurs (there are about 5,000 TSZs in the D-FW planning area). Finally, we conducted several screening and consistency checks on the resulting dataset from the previous steps.3 As part of this screening process, we eliminated observations that had missing data on departure times, activity purposes, and/or on the TAP location of the vehicle trip start.
The final sample used for analysis includes 19,455 vehicle trip observations. Of these, 2,940 trips (15.1%) are intrazonal.
The dependent variable of interest in our analysis is the time duration of trips. The mean trip duration for interzonal trips is about 21 minutes with a standard deviation of 24 minutes, while the mean trip duration for intrazonal trips is about 11 minutes with a standard deviation of 18 minutes (the standard deviation is higher than the mean because of the substantial scatter of trip duration values, particularly in the higher end of the duration spectrum).
Three types of variables were considered to explain trip duration. These were: 1) trip purpose variables indicating the purpose of the trip, 2) time-of-day variables identifying the time of trip start, and 3) zonal and trip attributes. Interactions among these three sets of variables were also considered. In the description below, we briefly highlight some of the characteristics of these sets of variables.
Trip purpose was characterized by two dimensions: whether or not the trip was produced at home (home-based versus nonhome-based trips) and the purpose at the attraction end of the trip (i.e., whether the attraction end activity is work, school, social or recreational, shopping, personal business, or other). Of the 19,455 trips, 14,294 (73.4%) were home-based, and this percentage is independent of the intrazonal or interzonal nature of trips (see table 1). However, the percentage of interzonal trips with work as the attraction end is higher than the percentage of intrazonal trips with work as the attractor end.
The time-of-day variables were associated with one of the following six time periods: morning (midnight6:30 a.m.), a.m. peak (6:30 a.m. 9:00 a.m.), a.m. off peak (9:00 a.m.noon), p.m. off peak (noon4:00 p.m.), p.m. peak (4:00 p.m.6:30 p.m.), and evening (6:30 p.m.midnight). The time periods for the a.m. and p.m. peaks were based on the peak period definitions employed by the NCTCOG transportation department in the D-FW area. The times for the offpeak periods were determined by splitting the remaining blocks of time at noon and midnight. The distribution of intrazonal and interzonal trips by time of day is presented in table 2. In general, the distributions by time of day are rather similar across intrazonal and interzonal trips.
We considered several zonal (TAP-level) land-use and demographic characteristics in our analysis. Of these, the following zonal attributes were significant determinants of trip duration: total zonal area, zonal household density, acreage in retail facilities, acreage in office space, number of people in service employment, acreage in institutional facilities (e.g., hospitals and churches), acreage in manufacturing and warehousing facilities, zonal median income, and presence of airports or airport-related infrastructure in the zone. The trip-related attribute included in the model was an indicator variable for whether or not the trip was intrazonal.
We obtained the final model specification of trip duration by systematically eliminating statistically insignificant variables and combining those found to have similar and comparable effects in terms of magnitude and significance.
Results of the Trip Duration Model
The empirical results for the log-linear regression model are presented in table 3, which provides the estimated values of , , , , and (t = 1,2,...,T ; i = 1,2,...,I) in equation (1). All the coefficients are statistically significant at a level of 0.02 or lower (except for the school attraction end coefficient, which is significant only at about the 0.15 level). The R 2 value (bottom of the table) is about 0.2, which indicates that the model does better than a naive model that predicts the average (log) duration value for all trips. However, the low R 2 value also suggests room for further specification improvement in future studies.
The trip purpose variables were included with nonhome-based trips as the base category (for home-based vs. nonhome-based trips) and with work as the base attraction end activity. The time-of-day variables were introduced with the evening period as the base category. The morning period is combined with the a.m. peak period because of the very small fraction of trips in the morning period (as can be observed in table 1).
The positive coefficient on the home-based trip variable under the trip purpose category in table 3 indicates that home-based trips tend to be significantly longer than nonhome-based trips. The coefficients on the attraction end variables under the trip purpose category need to be interpreted jointly with the time-of-day and trip purpose interaction effects. The results show that work trips are of the longest duration across all times of the day and purposes, except for school trips pursued during the midday/evening periods and social-recreational trips pursued in the late evening. Shopping trips are the shortest across all times of the day. The coefficient on the time-of-day variables, when considered jointly with the time-purpose interaction effects, indicate: 1) peak period trips are longer than nonpeak period trips, and this is particularly the case for work trips; and 2) work and nonwork trips undertaken during the evening period are shorter than trips taken at other times of the day (except for social-recreational trips, which are longer in the evening than earlier times of the day).
Several zonal and other trip attributes have a statistically significant effect on trip duration. We classify these attributes into three categories: zonal size-related variables, zonal nonsize-related variables, and trip-related variables.
Among the size-related variables, a larger total area of a zone, in general, increases the duration of trips produced from that zone. This is particularly the case if the zone has a high acreage in office space. Similarly, trips produced from zones with a high number of people in service employment and with large acreage in manufacturing facilities also have longer durations. These may reflect congestion effects. On the other hand, acreage in retail and institutional facilities has a negative effect on trip duration, possibly due to greater accessibility to shopping and service-related activities in these zones.
The zonal nonsize-related variables indicate shorter trip durations in zones with high household density and with high household income. However, trips produced from zones with an airport have a longer duration. This latter effect may be caused by increased congestion on roadways in zones with airport-related infrastructure or may be due to airports occupying a large area in the zone and thereby reducing the number of activity opportunities in the zone. Of course, other reasons may be equally plausible.
Finally, intrazonal trips are significantly shorter in duration than interzonal trips, especially during the p.m. peak, although the magnitude of this effect is less for shopping and social-recreational trip purposes.
INTEGRATION WITH TRAVEL
Existing travel demand models may be based on an activity approach or a trip approach. Activity-based travel demand models focus on the activities that people pursue, as a function of the locations and attributes of potential destinations, the state of the transportation network, and the personal and household characteristics of individuals (Ettema and Timmermans 1997). If such an approach is adopted in travel analysis, the activity stops made by individuals are explicitly modeled as a function of origin and destination activity categories, time of day, and zone of origin. Thus, information on trip purpose, time of trip start, and attributes of the zone of trip origin are readily available for all trips. Integration of the trip duration model developed in this paper within this framework is straightforward.
If a trip-based travel demand modeling framework is used, the trip duration model can be directly applied if the MPO develops zone-to-zone production-attraction interchanges for the disaggregate trip purpose and time of day categories identified in this paper. However, most MPOs use more aggregate classes of trip purpose and time periods (typically home-based work, home-based other, and nonhome-based trip purposes, and peak versus offpeak time periods). In this situation, the trip duration model can be used after post-processing the aggregate production-attraction trip interchanges matrix to reflect the disaggregate classifications employed here. Factors obtained from travel surveys can be applied to achieve this post-classification. Tables 4, 5, and 6 present such factors developed for the D-FW region.
In this section we conduct two evaluations of our proposed model. First, we evaluate our assumption of normality of the distribution of (log) trip durations in our regression (equation 1). For this purpose, we utilize normal probability plots and also a formal statistical test of normality. Second, we compare the performance of our proposed model for trip duration activity parameters with the "default" MOBILE model parameters that remain fixed for all zones and for all time of day and trip purpose categories (this is the state of the practice in the D-FW and other metropolitan areas).
Testing the Normality Assumption
We use two methods to evaluate the assumption of normality of log trip durations in our regression model (equation 1). We first develop an "eyeball" evaluation of the distribution of trip durations in our sample using normal probability plots. Next, we apply a rigorous statistical test for examining the normality assumption. These evaluations are discussed in the next two sections.
The "Eyeball" Evaluation
The eyeball evaluation typically entails two probability plots. The first, the normal Q-Q plot (i.e., the normal quantile-quantile plot), is the plot of the ordered data values against the associated quantiles of the normal distribution. For data from a normal distribution, the points of the plot should lie close to a straight line. The procedure to produce a Q-Q plot involves the following two steps:
- Sort the n
observed data points in ascending order so that
x 1 ≤ x 2 ≤...≤ x n,
and plot these observed values against one axis of the graph; and
F 1 ((i r adj) / (n +n adj))
on the other axis
where i is the rank of the respective observation on the ascending scale,
r adj and n adj are adjustment factors (≤0.5), and
F 1 denotes the inverse of the cumulative standard normal distribution function.
The resulting Q-Q plot is thus a scatterplot of the observed values against the (standardized) expected values, given the normal distribution.
The second plot, the P-P plot (i.e., the probability-probability plot) is similar to the Q-Q plot except that the observed cumulative distribution function is plotted against the theoretical cumulative distribution function. As in the Q-Q plot, the values of the respective variable are first sorted into ascending order. The i th observation is then plotted on one axis as i/n (i.e., the observed cumulative distribution function), and F (x (i)) is plotted on the other axis, where F (x (i)) represents the value of the cumulative normal distribution function for the respective observation x (i). If the normal cumulative distribution approximates the observed distribution well, then all points in this plot should fall onto a diagonal line.
In figures 1 and 2, we present the Q-Q and P-P plots for the logarithm of the trip duration values in our sample. The Q-Q plot is fairly linear except for large values of trip duration. This is not uncommon, because the Q-Q plot will amplify small differences between the model and sample probabilities when they are both close to one. The P-P plot is also fairly linear, except in the middle; this effect is also expected, because the P-P plot amplifies small differences in the "middle" of the model and sample probabilities.4 The linearity of the two probability plots provides some justification for the use of a normal distribution for the logarithm of trip durations.
Formal Statistical Test
We next conduct a formal statistical test of normality on the logarithm of trip duration data in our sample. In the statistical literature, several goodness-of-fit statistics have been proposed to test the (null) hypothesis that the sample observations are independent draws from a normal distribution. These tests work well for small to medium sample sizes; however, they will almost always reject the null when n is large (as is the case in our sample; see Gibbons 1985, p. 76). To circumvent this problem, we randomly chose a subsample of 1,000 vehicle trips from our data, and conducted the Kolmogorov-Smirnov (with Lilliefors correction) test for normality on this subsample (see Lilliefors 1967 for a description of this test). The Kolmogorov-Smirnov statistic for the subsample was 0.028, which is less than the critical value of 0.033 (computed as = , at the 0.01 level of significance). Therefore, based on this test, we cannot reject the null hypothesis of normality at the 0.01 level of significance.
From the informal plots discussed in the previous section and the statistical test discussed above, we conclude that our normality assumption for the logarithm of trip duration is reasonable.
In this section, we compare the performance of our proposed model with that of the "default" MOBILE model. The default model uses the same trip duration activity parameters for all zones in the region, and these parameters do not vary by time of day, and trip purpose.
For model evaluation, we observed data on the proportion of trip duration activity (equivalent to VMT) in each of the six time bins for each zone, each time of day, and each trip purpose.5 The corresponding trip duration activity parameters predicted by both our model and the default MOBILE model are available for analysis (note again that the default MOBILE model-predicted trip duration activity parameters do not vary by zone, time of day, or trip purpose).
The evaluation of the proximity of estimated and actual trip duration activity parameters can then be based on a pseudo-R 2 value computed as shown below
is the actual trip duration activity parameter (i.e., the proportion of VMT accrued) for trip duration bin k, trip purpose i, time- period t, and zone z,
is the model-predicted trip duration activity parameter,
is the area-wide average parameter for trip duration bin k for trip-purpose i in time-period t.
The denominator is the total variation in the actual trip duration activity parameter values around the mean area-wide value, summed across all trip duration bins for all zones, time periods, and trip purposes. The numerator represents the variation explained by the model.
The denominator in the above equation cannot be computed with the sample used in this paper, because we do not have adequate observations in each zone to obtain meaningful averages of VMT accrued in each time duration bin k for each zone and for each time-of-day and activity purpose combination. However, since the denominator remains the same for our proposed model and the default model, a comparison of the two alternative models can be made by taking the ratio of the values of the numerators of the proposed and default models. This statistic can be viewed as a "net performance measure" that represents an index of the total variation in the trip duration activity parameters explained by the proposed model as compared with the default model. A value of the net performance measure that is greater than unity will reveal that our proposed model is superior.
The computation of the net performance index as discussed earlier is still tedious. To simplify, we computed the measure for a restricted version of our proposed model vis-à-vis the default model. The restricted version ignores variations in the trip duration activity parameters across zones. Thus, we get the activity parameter predictions from our proposed model for a single representative zone (with characteristics representing the average of all the zones in the sample) for each time duration bin k and for each activity purpose and time-of-day combination. We also obtained the corresponding trip duration activity parameters from the default model, which assigns the same values of the proportion of VMT accrued in each time bin k across all activity purposes and times of the day (table 7). The net performance measure can be computed by evaluating the closeness of these predictions to the average values for each time of day and activity purpose obtained from the sample.
The value of the net performance measure is 3.89. This shows that our proposed model is performing about four times better than the default model in explaining the variation in trip duration activity parameters across zones, time of day, and trip purpose. Additionally, we also computed performance measures for each of three broad time periods (peak, offpeak, and evening) and for each of the 12 trip purpose combinations. Table 8 presents these performance measures. As can be seen, the proposed model out-performs the default model in all the categories, revealing its efficiency in capturing the trip duration activity parameters relative to the default model.
In addition to our performance measure, we compared the two models through a statistical test. For this, we note that the disaggregate version of the default MOBILE model is equivalent to the "constant-only" specification for the log (trip duration) in equation 1. An F -test of the null hypothesis that all the parameters (except the constant) are equal to zero would therefore test our model (the "unconstrained" specification) against the default model (the "constrained" specification). The corresponding F -statistic is 229.05. This very strongly rejects the null hypothesis that the zonal characteristics, time of day, and trip purpose do not affect trip duration (the relevant critical F -value at the 99% significance level is 2.36).
The modeling of trip durations in a metropolitan area is important for three reasons. First, trip duration activity parameters used by the MOBILE emissions factor model to estimate running loss emissions can be developed from the trip duration distribution. Second, the trip duration distribution provides information for estimating operating mode fractions, which are needed by MOBILE5 to estimate emissions rates. Third, the trip duration distribution can be used to predict the VMT accumulated on local roads in the region.
Trip duration is likely to depend on various factors such as trip purpose, time of day of the trip start, and other land-use and socio-demographic characteristics of the zone of trip production. In this paper, we formulated and implemented a methodology for modeling trip durations as a function of these characteristics, using vehicle trip data from household travel surveys and supplementary zonal, demographic, and land-use data. The approach involves developing the distribution of the duration of trips using a log-linear regression model. The modeling framework is implemented in the context of mobile source emissions analysis for the Dallas-Fort Worth area of Texas. Model evaluation indicates that the proposed model is superior to the default model in explaining the variation in trip duration activity parameters across zones, times of day, and trip purposes.
The proposed model contributes significantly toward improved mobile source emissions forecasting by systematically developing area-specific estimates of running loss emissions, running mode fractions, and VMT on local roads. A distinguishing characteristic of the methodology is the straightforward manner in which model parameters estimated from vehicle trip data can be applied to obtain zonal-level trip duration distributions. The model can be integrated easily within various travel demand-air quality modeling frameworks.
Future work in this area could include developing a generalized posterior distribution for trip duration using a Bayesian framework that infers the nature of the distribution of trip durations from the sample. The trip duration activity parameters could then be computed by numerically evaluating the VMT proportions from this posterior distribution. Further, rather than parameterize the trip duration distribution, a flexible semi-nonparametric model could be developed, which could capture (possible) nonlinear responses in trip durations to changes in the exogenous variables.
This study was funded by the Texas Department of Transportation project "Transportation Control Measure Effectiveness Evaluation in Ozone Non-Attainment Areas." The authors would like to thank Bill Knowles, Carol Nixon, Wayne Young, and George Reeves for their valuable input throughout this research effort. Thanks are also due to Ken Cervenka and Mahmoud Ahmadi of the North Central Texas Council of Governments (NCTCOG) for providing, and clarifying data issues related to, the 1996 activity survey and the zonal land-use characteristics file for the Dallas-Fort Worth area. Ken Kirkpatrick and Christopher Klaus helped clarify current NCTCOG transportation air quality procedures. The authors would also like to thank three anonymous reviewers for their constructive suggestions on an earlier version of the paper. Finally, Lisa Weyant helped with typing and formatting.
Allen, W.G. and G.W. Davies. 1993. A New Method for Estimating Cold-Start VMT. Compendium of Technical Papers from ITE 63rd Annual Meeting, The Hague, The Netherlands, September 22.
Brodtmen, K.J and T.A. Fuce. 1984. Determination of Hot and Cold Start Percentages in New Jersey, Report FHWA/NJ-84/001. Prepared for the U.S. Department of Transportation, Federal Highway Administration by the New Jersey Department of Transportation. July.
Chatterjee, A., P.M. Reddy, M. Venigalla, and T.L. Miller. 1996. Operating Mode Fractions on Urban Roads Derived by Traffic Assignment. Transportation Research Record 1520:97103.
Ellis, G.W., W.T. Camps, and A. Treadway. 1978. The Determination of Vehicular Cold and Hot Operating Mode Fractions for Estimating Highway Emissions. Alabama Highway Department.
Ettema, D. and H. Timmermans, eds. 1997. Activity-Based Approaches to Travel Analysis. Pergamon Books.
Frank, L.D., B. Stone, Jr., and W. Bachman. 2000. Linking Land-Use with Household Vehicle Emissions in the Central Puget Sound: Methodological Framework and Findings. Transportation Research D 5:173196.
Gibbons, J.D. 1985. Nonparametric Methods for Quantitative Analysis, 2nd Edition. Columbus, OH: American Sciences Press.
Glover, E.L. and D.J. Brzezinski. 2001. Trip Length Activity Factors for Running Loss and Exhaust Running Emissions, Report M6.FLT.005. U.S. Environmental Protection Agency, Assessment and Standards Division. Available at http://www.epa.gov/otaq/models/mobile6/m6tech.htm, as of June 9, 2003.
Greene, W.H. 1997. Econometric Analysis. Upper Saddle River, NJ: Prentice-Hall International. pp. 948952.
Johnson, N., and S. Kotz. 1970. Distributions in StatisticsContinuous Univariate Distributions. New York, NY: Houghton Mifflin.
Law, A.M, and W.D. Kelton. 1991. Simulation Modeling and Analysis, 2nd Edition. New York, NY: McGraw Hill.
Lilliefors, H.W. 1967. On the Kolmogorov-Smirnov Test for Normality with Mean and Variance Unknown. Journal of the American Statistical Association 62, June.
Nair, H.S., C.R. Bhat, and R.J. Kelly. 2002. Modeling Soak-Time Distribution of Trips for Mobile Source Emissions Forecasting: Techniques and Applications. Transportation Research Record 1750:2431.
U.S. Environmental Protection Agency (USEPA), Office of Mobile Sources. 1993. Federal Test Procedure Review Project: Preliminary Technical Report, EPA 420-R-93-007. Available at http://www.epa.gov/oms/regs/ld-hwy/ftp-rev/ftp-summ.txt, as of June 9, 2003.
Venigalla, M., A. Chatterjee, and M.S. Bronzini. 1999. A Specialized Equilibrium Assignment Algorithm for Air Quality Modeling. Transportation Research D 4:2944.
Address for Correspondence and End Notes
Authors' addresses: Harikesh S. Nair, 5532 S. Kenwood Ave., Apt. 108, Chicago, IL 60637. Email: firstname.lastname@example.org.
Corresponding author: Chandra R. Bhat, Department of Civil Engineering, The University of Texas at Austin, 1 College Station C1761, Austin, TX 78712. Email: bhat@ mail.utexas.edu.
KEYWORDS: local-road VMT, MOBILE emissions factor model, mobile source emissions, operating mode fractions, running loss emissions, trip duration.
2.V k may be obtained from local metropolitan area data or using the following national default values obtained from the 1995 Nationwide Personal Transportation Survey data: 18.96 mph (for trips of duration 010 minutes), 20.80 mph (for trips of duration 1120 minutes), 26.40 mph (for trips of duration 2130 minutes), 29.14 mph (for trips of duration 3140 minutes), 33.60 mph (for trips of duration 4150 minutes), and 45.30 mph (for trips of duration greater than 51 minutes). represents the fraction of intrazonal trips originating from zone z and can be obtained from the sample used for estimation. If the sample data do not support evaluation of for all zones, can be determined from the zone-to-zone production-attraction trip interchanges matrices obtained at the end of the trip distribution step in the travel demand modeling process.
5. This information is not available in our dataset, because we observed only a sample of trips made by households in each zone that did not span all of the time of day, trip purpose, and trip duration bin categories.