Predicting the Construction of New Highway Links
University of Minnesota
This paper examines new highway construction based on the status of the network, traffic demand, project costs, and budget constraints. The data span two decades and consist of descriptions of physical attributes of the network, the construction and expansion history, and average annual daily traffic values on each of the links. An algorithm is developed to designate adjacent and parallel links in a large network. A nonlinear cost model for new construction and highway expansion is developed for the Minneapolis-St. Paul metropolitan area. Results show that new links providing greater potential access are more likely to be constructed and that more links will be constructed when the budget is larger, which supports the underlying economic theory. The models developed here have important implications for planning and forecasting, allowing us to predict how networks might be altered in the future in response to changing conditions.
The 240 km of paved road in the United States in 1900 increased to about 6.4 million km in 2000, providing virtually 100% of the U.S. population with almost immediate access to paved roadways (USDOT 2002). The growth or decline of transportation networks obviously affect a region's social and economic activities, yet the dynamics of how such changes occur is one of the least understood areas in transportation research and regional science. This lack of understanding is often revealed in the long-range planning efforts of metropolitan planning organizations (MPOs), where transportation network change is treated exclusively as the result of top-down decisionmaking. In fact, changes to the transportation network are the result of numerous small decisions (and some large ones) by semi-autonomous entities (firms, developers, towns, cities, counties, state department of transportation districts, MPOs, and states) in response to market conditions and policy initiatives. Understanding how markets and policies translate into facilities on the ground is essential for scientific understanding and for improving forecasting, planning, policymaking, and evaluation.
The study of network growth has been limited. Taaffe et al. (1963) explored the economic, political, and social forces behind infrastructure expansion in underdeveloped countries and found that roads are initially developed to connect regions of economic activity and feeder roads later connect to these initial investments. Garrison and Marble (1965) observed that connections to the nearest large neighbor explained the order of rail network construction in Ireland. Grübler (1990) found that the growth of infrastructure follows a logistic curve and road infrastructure in developed countries has reached saturation levels. Yamins et al. (2003) developed a simulation that grows urban roads using simple connectivity rules proportional to the activity at locations. Yerra and Levinson (2003) developed a simulation model to capture the expansion of existing links. The results also show that hierarchical arrangements of roads (i.e., specific routes with continuous attributes), are emergent properties of transportation networks.1 Several studies have examined specific networks, for example, the London Underground (Barker and Robbins 1975), but no general theoretical framework has been given for incremental network growth at the microscopic level.
Our study focuses on understanding the conditions under which new links are constructed (as opposed to existing links being improved) on a highway network. The construction of new links can be modeled in several ways, assuming we have the location of possible and existing nodes. We could assume that all (or a very large number of) nodes are connected, but at some very slow speed, and then use a network investment model to improve selected links while allowing others to whither, much as a neural network learns. In contrast to this process, we could assume that, for every node, there is a set of possible nodes it can connect with (neighbors within a certain radius to which it is not already connected). The connections made depend on underlying conditions.
It is the second approach that we investigate in this paper. Specifically, we want to understand the effects of travel demand, cost of construction, budget, and the surrounding conditions on the generation of new links. A highway network is thought to be expanded or constructed due to congested traffic conditions or in anticipation of regional economic development. Limited budgets and existing land uses constrain the number of new links constructed in a given period. The traffic level on parallel links is expected to be a highly significant factor in new construction. Also, the number of potential trips on the new link is thought to be an important factor in its initiation.
Theory and statistical techniques used for this study are explained in the next section of this article. The following section provides a description of the data used and its assembly. In that section, adjacent and parallel links are designated and the model used to estimate the cost of construction is described. The next section presents the model we used and poses the specific hypotheses. Results are then presented, followed by conclusions.
THEORY AND STATISTICAL MODELING
Construction of a new link may alleviate traffic congestion or open a new area to development by increasing accessibility. Such a link could lead to the availability of additional routes and cause traffic patterns to change. Each of the variables considered affect either the supply curve or the demand curve and can shift the equilibrium.
A higher transportation budget (B) increases the ability to expand or construct highways resulting in an outward shift in the demand curve. Increasing the cost of construction decreases its likelihood. Previous studies empirically show that, in mature networks, the capacity added to the system decreases over time (Nakicenovic 1988; Grübler 1990). A marginal increase in capacity decreases average travel demand per lane (demand by consumers) as existing capacity increases.
Expanding a link means additional trips on that link due to re-routing and rescheduling of trips and also due to induced demand (Parthasarathi et al. 2003; Levinson and Kanchi 2002; Fulton et al. 2000; Noland 2001). In light of induced demand, the effect of roadway expansion in reducing traffic congestion is not fully understood. Furthermore, although consumers' surplus increases after the expansion, travelers are inconvenienced during its expansion.
Long links take more time to complete and diverting traffic during that period is difficult. The possibility of constructing a new link increases in such scenarios and hence the condition of traffic in the surrounding links is a crucial factor for new link construction. Networks, because of land scarcity, tend to grow more in the peripheries once they reach saturation levels near downtown.
Due to the discrete nature of the dependent variable (a new link is constructed or not), we considered discrete choice modeling to be appropriate. Initial modeling was done using a logit model, although there are certain limitations (Haynes et al. 1988; Haynes and Fotheringham 1991). The logit probabilities are derived under the assumption that the unobserved portion of the utility (combined with the error term) is distributed in accordance with the extreme value distribution. The probability of a decisionmaker choosing alternative a is given by
Logit models assume that tastes are invariant across the population and consequently estimate fixed coefficients for the variables. In general, individual tastes vary across the population and this variation in tastes (random effects) should be included in modeling. In mixed (random parameter) logit models, the unobserved heterogeneity (individual-specific effects) is taken care of (McFadden and Train 2000; Train and Brownstone 1999; Hensher 2001). The likelihood function is similar to logit models, but the coefficient of some variables is not fixed across the population. Because our models did not suggest varying coefficients for variables, only the constant taste variable is assumed to vary across the population. In this model, the utility of an alternative a is given by
Ua= β'Xa = (ηa + εa)
Ua= utility of alternative a,
X = vector of variables,
ηa = random term with zero mean (any distribution),
εa = random term with extreme value distribution.
Given the value of random term η from its distribution, the choice probability is again logit. Because we do not know the value of the random term, we integrate the logit probability over all values of the random term using its density function. The choice probability of an alternative for an individual is then given by
where Ωs are the distributional parameters of the random term in the likelihood function. This is called mixed logit, because the terms are split into a mixture of distributions. The integral above does not have a closed form in general. To overcome this problem, for each individual we need to average over a range of simulated values of likelihood. Taking this value as the probability, the log-likelihood function of multiplication of simulated probabilities of all individuals is maximized to obtain the coefficients of utility functions of each alternative.
The log-likelihood of the simulated probability is a biased estimator of the true probability. This bias decreases as the number of draws increases. In the simulations, values from a uniform random generator are converted to the distribution of the random variable. When using a random number generator, it is sometimes possible that large sections of the distribution are not generated. The uniform random number generator does not guarantee uniform coverage in a given simulation (even coverage is guaranteed only on infinite draws). Halton draws2 have been suggested for their specific advantage of even coverage. It has been found that 125 Halton draws are as efficient as 2,000 random draws in simulations of this kind (Bhat 2001). To reduce the computational time and increase the efficiency, Halton draws were used for this study.
The dataset for this study is built using data from three different sources. The Metropolitan Council of the Twin Cities of Minneapolis and St. Paul, Minnesota, provided network data for 1995 with length and location of each link. Each link is identified by its start and end nodes. Data on average annual daily traffic were obtained from the Transportation Information Systems Division of the Minnesota Department of Transportation. Data on construction of new links and expansion of the existing links were obtained from the local Transportation Improvement Program and the Hennepin County Capital Budget for 1978 to 1998. Data on new county highways are available only for Hennepin County. Hennepin is the largest of the seven counties and contains the city of Minneapolis. Using the investment data, a network for each of the years is built with 1995 used as the base network. The remaining dataset was integrated using ArcView geographic information systems software and custom computer programs.
While links to be expanded are chosen from the existing network, when a new link is going to be constructed, it is selected from a set of possible links between nodes. In the case of the Twin Cities network, creation of new nodes because of new construction was not observed. The possible set of new links is, therefore, based only on existing nodes. Theoretically, a node can be connected to any of the remaining nodes.3
The mean length of newly constructed links was 0.68 km and the maximum was 4.54 km. Because of the large number of possible connections and high redundancy levels within the radius of 4.54 km, a shorter range of possible lengths was considered. In the new scenario, only links between 200 meters and 3.2 km in length were considered. These lengths were arrived at by removing new construction in the five percentile regions on both ends of the dataset. We observed that new nodes are seldom created by new construction in the Twin Cities network. This indicates that a possible set of new links should be such that they do not cross any of the existing links that are of higher-level hierarchy than the link being constructed, because doing so would create a new node. However, new links can cross lower level roads without technically intersecting them (via overpasses).
With the above restrictions, each node was found to have on average a set of 10 possible connections, with 29,804 possible new links. We found, however, that only 69 bi-directional new links (all highways) were actually constructed in the past two decades. There were, of course, many lower-level roads built and other higher-level roads widened, but those are not addressed here.
Adjacent and Parallel Links in a Network
We needed to compute the potential amount of traffic a newly constructed link might serve based on the traffic on the nodes it connects. The links that would be connected, "adjacent links," were divided into two categories: supplier links and consumer links. Supplier links would supply traffic to the new link, while consumer links are the links the traffic would move to after traversing the new link. A link (ij) that is a supplier to another link (jk) may be a consumer link in the other direction (ji receives traffic from kj). A computer program was written to enumerate adjacent links for each of the possible links.
The parallel link can be thought of as the link that would bear most of the diverted traffic if the link in consideration were closed. This definition is extended to new links by assuming the link is constructed and then finding the parallel link in the existing network. It is necessary to identify parallel links, because they are the links currently serving the traffic of that area. Because of the large number of possible new links, parallel links were not identified using traffic assignment. Rather, parallel links were assigned to each of the possible links using fuzzy theory (Zadeh 1992; Kosko 1993). Fuzzy theory assumes a continuous truth-value rather than the deterministic Boolean values used conventionally. The sum composition method combined with appropriate weights was found suitable for our purposes.
In general, a parallel link is in the proximity of link L, approximately parallel to it in orientation and of comparable length. Four attributes are defined to satisfy the above requirements. The first attribute is based on the angular difference between the orientations of the two links, which should be as small as possible. The second attribute is the perpendicular distance from mid-point of link L to the other link divided by length of link L. The third attribute is the sum of the distance between the start and end nodes of the two links being compared. The final attribute takes the ratio of lengths of the two links into consideration. Mathematically, the four attributes are defined as follows:
1. Para = 1 (angular difference) / 45
2. Perp = 1 a*(perpendicular distance) / length of link L
3. Dist = 1 b*(sum of node distances) / length of link L
4. Comp = 1 c*(lratio 1)
where perpendicular distance is from the center of link L to the other link, node distances are distances between the corresponding start and end nodes, lratio is the ratio of length of the probable parallel link to the length of link L or the inverse of it, whichever is greater.4
In sum composition, computing the truth-value of each attribute and then summing these values gives the fuzzy output. Here, we modified this method by weighing the truth-values of the attributes based on the importance of each attribute in relation to others. The assumed parameters of a, b, and c and the assumed weights of the attributes are given in table 1. These values were calibrated to match our expectations of what should be the most parallel link using a few sample links. One parallel link was selected for each link.
A cost function is needed to estimate the cost of possible new construction. Investment data obtained from the Metropolitan Council's Transportation Improvement Program and the Hennepin County Capital Improvement Program were used to estimate a modified Cobb-Douglas (log-log) model. This model was used to account for the non-linear behavior of some of the explanatory variables.
ln (Eij) + a + b1 ln (Lij*ΔCij) + b2N + b3HI +b4Hs + b5Y + b6ln(P) + b7X
Eij = cost to construct or expand the link (in nominal thousands of dollars),
Lij*ΔCij = lane kilometers of construction (Length * Increase in number of lanes),
N = dummy variable 1 if new construction or 0 if expansion,
HI , Hs = dummy variables for Interstate highways (HI) and state highways (Hs ), (default = county highways)
Y = year of completion (1979),
P = period (duration) of construction (in years),
X = distance of the link from the nearest downtown (Minneapolis or St. Paul) (in km).
The data consist of both expansions and new construction projects totaling 76 observations (more than 1 link can be expanded in a single project). Results of the model are shown in table 2. The coefficient of lane kilometers of construction (Lij*ΔCij) is less than one, indicating economies of scale in construction. As can be expected, the cost of a new construction project (N) is higher than expanding an existing link. The cost of construction increases with the hierarchy of the road (H) (H = 0 represents a county road). The year variable (Y) controls for inflation and the improving quality of the road construction. Longer duration projects (P) cost more and construction becomes costlier over time. The distance from the nearest downtown, entered as a linear variable (X), shows that the project cost decreases as it moves away from downtown areas. Downtown areas have higher traffic flows and land costs and hence restrict the construction flexibility, generating the extra cost.
Due to the few new links built over the last two decades, construction was assumed to occur in five-year intervals and the dataset was built accordingly. The budget over these five years was summed to act as a budget constraint. Nodes connecting only local roads (below county highways) were not considered in modeling, because we did not have data on new construction of such roads. New construction in the next time interval can be modeled as
N ijt + 1 = ∫(Lij, Cp, Lp, Qp/Cp, A, Eij, B, T, X, D)
N ijt + 1 = dummy for new construction of link ij in period (t + 1),
Lij = length of link ij along the road,
Cp = capacity of the parallel link,
Lp = length of the parallel link,
Qp = flow on the parallel link,
Qp / Cp = congestion measure on the parallel link,
A = product of total supplier link flows and total consumer link flows (access),
Eij = cost of constructing the new link,
B = transportation department's budget constraint,
T = time period of construction,
X = distance from the nearest downtown,
D = number of nodes within the interval of 200 meters and 3.2 km.
Volumes on the links are directional.
Variable A can be considered an accessibility measure of the new link. It represents the effect of supplier link flows and consumer link flows on the probability of new construction. The effect of surrounding conditions was expected to be prominent in the construction of a new link compared with a link expansion. Based on that theory, the hypotheses are as follows:
- High congestion on the parallel link (Q p /C p ) favors construction of the link to relieve traffic on the parallel link.
- Higher capacity of the parallel link (C p ) decreases the likelihood of new construction, as capacity is already available. However, high capacity links are less likely to be expanded.
- Longer links (L ij ) are less likely to be expanded because of the longer duration of construction.
- A longer length of a parallel link (L p ) favors new construction, because longer links tend not to be expanded as often due to the duration inherent in such an expansion.
- High expected costs of construction (E ij ) on the new link decreases the probability that the link will be built, while a higher transportation budget (B) increases that probability.
- A higher access score (A) for a link increases the chances of construction.
- As was observed in the literature, road construction has declined over time and was expected to be reflected in a negative sign on the year (T) variable.
- New links have a higher probability of being constructed far from downtown (X), as land acquisition is easier there.
- A large node density (D) in the surrounding area results in fewer new links being constructed, because the number of links is high in these areas.
Binomial logit and mixed logit modeling were used to analyze the dataset. The results are shown in the following section.
A binomial logit model was used to estimate the construction of a new link between existing nodes. Results of the regression models are given in table 3. Variables Cp, Eij , and X are negative and significant while the variables Lp , A, T, and B are positive and significant.
As has been noted earlier, the construction of a new link depends heavily on its surrounding conditions and alternate route conditions. The longer the parallel link (Lp), the higher the probability of a new link. This might be interpreted as reflecting the cost involved in the expansion of the longer parallel link and also as a result of the traffic diversion problems on the parallel link if it were expanded.
The capacity of the parallel link (Cp ) is negative and significant, supporting this hypothesis. High capacity links already serve high volumes of traffic in an area (generated in or passing through that area) and hence reduce the need for a new link.
A high access measure (A) between two nodes tends to increase the probability of new construction connecting those nodes. Access is directly proportional to the total time savings due to new construction and hence it is logical that high demand between two nodes has this effect.
A higher cost of constructing a new link (Eij ) reduces its probability of expansion, as expected. Also, more new construction is possible when the budget (B) is higher.
Contrary to our hypothesis, distance to the nearest downtown (X) variable is negative and significant, indicating that new links are more likely to be built nearer to downtown than in the suburbs. This probably reflects the completion of the Interstate Highway System in the Twin Cities, which saw the urban links finished last (in the past 20 years), while suburban links were completed as long as 40 years ago.
More new links are being constructed with the passage of time (T), refuting the hypothesis. Earlier studies showed decreasing expansion rates for existing links. This may reflect a policy shift from expansion to new construction. Expanding a road leads to traffic inconvenience during construction, a problem that can be avoided by new construction, which may explain the reasoning behind more new construction.
A mixed logit model was estimated to allow for the taste variances of individual links (i.e., of decisionmakers). Table 3 gives the results of the model. The log likelihood value was improved by 3%, indicating a better model. As mentioned earlier, changes in traffic demand were not considered due to the low number of new links. Considering changes in demand would require dropping one period of observation. The random term was assumed to have a triangular distribution and its estimated standard deviation is given in the table. Models with other possible distributions for the random term did not improve significance. The significant variance in the constant term reflects the variance in the links due to the effects of these omitted variables and the inherent taste variance (of decisionmakers). More data are needed to model new construction with other influencing variables.
However, a mixed logit model can to some extent encompass the effect of these variables. Omitted variables in a model increase the standard error of the estimated variables and thus cloud the significance of some variables. For instance, the number of nodes in the surrounding area (D) is significant in the new model, supporting the hypothesis. The coefficients of variables changed significantly when the unobserved variance was accounted for. The z-values of the mixed logit model are higher than those of the logit model indicating increased reliability of the estimated coefficients. In the case of the mixed logit model, the congestion on the parallel link is negative and significant, refuting our hypothesis (it was insignificant in the logit model).
Out of a network of 29,804 possible new links, there were 69 new links in the time period considered. Of the 69 most likely new construction links as predicted by the models, the logit model identified 17 links that were actually built. The mixed logit model performed better predicting the same 17 links and an additional 5 new links correctly. In view of these results, mixed logit models perform better than conventional discrete choice models.
This paper developed a model to predict the location of new highway construction based on the surrounding conditions of the new link, the estimated cost of construction, and a budget constraint. A new process for identifying potential construction projects is developed. The methodology used here reduces the number of possible newly constructed links drastically and paves the way for feasible modeling. This paper provides a practical solution to the problem of identifying adjacent and parallel links in a large network. Using the investment data, a model to estimate the cost of potential new construction is developed here.
Results indicate significant dependence on parallel link attributes and potential access to traffic due to the new link. A newly constructed link provides an additional route; hence, its construction depends on the attributes of the links that presently serve the region. A high capacity route is sufficient to cater to the traffic generated in or going through the region and usually does not require a new construction project. New construction projects are less likely to be undertaken if they are costly and are limited by the available budget. New links are unnecessary when the region is well connected, as reflected by the node density variable. Two different types of discrete choice models were estimated to compare their performances. It was found that mixed logit models perform better than logit models and account for unobserved taste variance.
Although politics factor into these decisions, it should be noted that they are constrained by the decisions made in the past and by the present conditions of the network. The models suggest a number of significant factors that lead to new highway construction. The models estimated here also can be used to monitor the growth of the network given projected traffic demand for the existing links and values of model variables at present conditions. This would improve transportation planning by enabling modelers to predict pressures for additional links. Forecasting future demands on the transportation network requires a forecast of the network structure itself. Only with models of new link construction and link expansion can these forecasts be made.
Barker, T.C. and M. Robbins. 1975. A History of London Transport, vols. 1 and 2. London, England: Allen and Unwin.
Bhat, C.R. 2001. Quasi-Random Maximum Simulated Likelihood Estimation of the Mixed Multinomial Logit Model. Transportation Research B 35:677693.
Fulton, L.M., R.B. Noland, D.J. Meszler, and J.V. Thomas. 2000. A Statistical Analysis of the Induced Travel Effects in the U.S. Mid-Atlantic Region. Journal of Transportation and Statistics 3(1):114.
Garrison, W.L. and D.F. Marble. 1965. A Prolegomenon to the Forecasting of Transportation Development, U.S. Army Aviation Material Labs Technical Report. Office of Technical Services, U.S. Department of Commerce.
Grübler, A. 1990. The Rise and Fall of Infrastructures: Dynamics of Evolution and Technological Change in Transport. Heidelberg, Germany: Physica-Verlag.
Haynes, K. and S. Fotheringham. 1991. The Impact of Space on the Application of Discrete Choice Models. The Review of Regional Studies 20(2):3949.
Haynes, K., D. Good, and T. Dignan. 1988. Discrete Spatial Choice Modeling and the Axiom of Independence from Irrelevant Alternatives. Socio-Economic Planning Sciences 22(6):241251.
Hensher, D. 2001. The Valuation of Commuter Travel Time Savings for Car Drivers: Evaluating Alternative Model Specifications. Transportation 28(2):101118.
Kosko, B. 1993. Fuzzy Thinking: The New Science of Fuzzy Logic. New York, NY: Hyperion.
Levinson, D. and S. Kanchi. 2002. Road Capacity and the Allocation of Time. Journal of Transportation and Statistics 5(1):2545.
Levinson, D. and R. Karamalaputi. 2003. Induced Supply: A Model of Highway Network Expansion at the Microscopic Level. Journal of Transport Economics and Policy 37(3):297318, September.
McFadden, D. and K. Train. 2000. Mixed MNL Models for Discrete Response. Journal of Applied Econometrics 15:447470.
Nakicenovic, N. 1988. Dynamics and Replacement of U.S. Transport Infrastructures. Cities and Their Vital Systems: InfrastructurePast, Present and Future. Edited by J.H. Ausubel and R. Herman. Washington, DC: National Academy Press.
Noland, R.B. 2001. Relationships Between Highway Capacity and Induced Vehicle Travel. Transportation Research A 35(1):4772, January.
Parthasarathi, P., D. Levinson, and R. Karamalaputi. 2003. Induced Demand: A Microscopic Perspective. Urban Studies 40(7):13351351.
Taaffe, E.J., R.L. Morrill, and P.R. Gould. 1963. Transport Expansion in Underdeveloped Countries. Geographical Review 53:503529.
Train, K. and D. Brownstone. 1999. Forecasting New Product Penetration with Flexible Substitution Patterns. Journal of Econometrics 89:109129.
U.S. Department of Transportation (USDOT), Bureau of Transportation Statistics. 2002. Transportation Indicators. Available at http://www.bts.gov/transtu/indicators/, as of January 2004.
Yamins, D., S. Rasmussen, and D. Fogel. 2003. Growing Urban Roads. Networks and Spatial Economics 3:6985.
Yerra, B. and D. Levinson. 2003. The Emergence of Hierarchy in Transportation Networks, presented at the Western Regional Science Association Meeting, RioRico, AZ, February.
Zadeh, A.L. 1992. Fuzzy Logic for the Management of the Uncertainty. New York, NY: Wiley Publications.
Authors' addresses: David Levinson, Assistant Professor, 500 Pillsbury Dr. SE, Dept. of Civil Engineering, University of Minnesota, Minneapolis, MN 55455. E-mail: firstname.lastname@example.org.
Ramachandra Karamalaputi, Analyst, Capital One, 3470 Kilburn Circle, # 1028, Richmond, VA 23233. E-mail: email@example.com.
KEYWORDS: highway construction, cost model, transportation forecasting, network growth, mixed logit.
1. Specific routes with continuous attributes imply that connecting links do not necessarily have the same attributes. For instance, two four-lane links connecting at an intersection with two two-lane links imply that the four-lane links are part of a continuous route and the two-lane links are part of a different route. Because these routes are differentiated, some must be more important than others, which produces a hierarchy of roads.
2. Halton draws are generated using a prime number, p, as a seed. The interval (0,1) is divided into n equal intervals and those form the first numbers of the sequence. Each of the n intervals is again divided equally into n sub-intervals. The new sets of numbers are arranged in a particular fashion to continue the sequence. For a detailed discussion, see Bhat (2001).
3. Freeway interchanges were treated as a single node for this purpose. With the current network, this possible connection can be made with any of the nodes at interchanges. To overcome this feature in the dataset, all the nodes within 50 meters of each other were given the same node number. A computer program was written to accomplish this task and the resulting node set was used to investigate new construction.
4. Dist and Perp differ in that Perp considers the perpendicular (or shortest) distance between the links, while Dist looks at the distances between the beginnings and ends of the links, which may not be perpendicular. In a perfect grid network, the two variables would measure the same thing, but most networks are not perfect grids (see Levinson and Karamalaputi 2003).