2012 Commodity Flow Survey Overview and Methodology

Purposes and Uses of the Commodity Flow Survey
Industry Coverage
Shipment Coverage
Sample Design
First Stage - establishment selection
Sampling frame
Sample size and allocation
Second Stage - reporting week selection
Third Stage – shipment selection
Data Collection
Imputation of Shipment Value or Weight
Reliability of the Estimates
Sampling Error
Nonsampling Error
Mileage Calculation
Methodological Changes to Mileage Calculation for the 2012 CFS
Air versus Parcel Mode
Routing a Shipment When Mode Is Other, Unknown, or Missing
Private Truck versus For-Hire Truck


The Commodity Flow Survey (CFS) is undertaken through a partnership between the Bureau of Transportation Statistics (BTS), U.S. Department of Transportation and the U.S. Census Bureau, U.S. Department of Commerce.  The first CFS was conducted in 1993 and has been followed by surveys in 1997, 2002, 2007, and now 2012.  Since 1997, the survey has been conducted in years ending in “2” or “7”, aligning it with (and as a component of) the economic census.  The CFS produces data on the movement of goods in the United States.  It provides information on commodities shipped, their value, weight, and mode of transportation, as well as the origin and destination of shipments from establishment in these industry sectors: manufacturing, mining, wholesale, and selected retail and services establishments.

Purposes and Uses of the Commodity Flow Survey
The CFS assesses the demand for transportation facilities and services, energy use, and safety risk and environmental concerns. CFS data are used by policy makers and transportation planners in various federal, state, and local agencies. Additionally, business owners, private researchers, and analysts use the CFS data for analyzing trends in the movement of goods, mapping spatial patterns of commodity and vehicle flows, forecasting demands for the movement of goods, and determining needs for associated infrastructure and equipment.


Industry Coverage
The 2012 Commodity Flow Survey (CFS) covers business establishments with paid employees that are located in the United States and are classified using the 2007 North American Industry Classification System (NAICS) in mining, manufacturing, wholesale, and selected retail and services trade industries, namely, electronic shopping and mail-order houses, fuel dealers, and publishers.  Additionally, the survey covers auxiliary establishments (i.e., warehouses and managing offices) of multi-establishment companies. For the 2012 CFS, an advance survey (pre-canvass) of approximately 100,000 establishments was conducted to identify establishments with shipping activity and to try and obtain an accurate measure of their shipping activity.  Surveyed establishments that indicated undertaking shipping activities and the non-respondents to the advance survey were included in the CFS sample universe.

Establishments classified in transportation (with the exception of some establishments in 484 and 4931), construction, and most retail and services industries are excluded from the survey. Farms, fisheries, foreign establishments, and most government-owned establishments are also excluded.

In-scope industries for the 2012 CFS were selected based on the 2007 version of the NAICS, while the industries included in the 2007 CFS were selected based on the 2002 version of the NAICS.  The industries included in the 2002 CFS were selected based on the 1997 version of the NAICS.  However, for the 1993 and 1997 surveys, the industries were selected based on the 1987 Standard Industrial Classification System (SIC) and, although attempts were made to maintain similar coverage among the SIC based surveys (1993 and 1997) and the NAICS based surveys (2002, 2007 and 2012) there have been some changes in industry coverage due to the conversion from SIC to NAICS. Most notably, coverage of the logging industry changed from an in-scope Manufacturing (SIC 2411) to the out-of-scope sector of Agriculture, Forestry, Fishing, and Hunting under NAICS 1133. Also, publishers were reclassified from Manufacturing (SIC 2711, 2721, 2731, 2741, and part of 2771) to Information (NAICS 5111 and 51223) and were excluded in the 2002 CFS. The 2007 and 2012 surveys, however, include publishers and retail fuel dealers.

The NAICS industries covered in the 2012 CFS are listed in the following table:




Mining (Except Oil and Gas)


Food Manufacturing


Beverage and Tobacco Product Manufacturing


Textile Mills


Textile Product Mills


Apparel Manufacturing


Leather and Allied Product Manufacturing


Wood Product Manufacturing


Paper Manufacturing

323 1

Printing and Related Support Activities (except 323122)


Petroleum and Coal Products Manufacturing


Chemical Manufacturing


Plastics and Rubber Products Manufacturing


Nonmetallic Mineral Product Manufacturing


Primary Metal Manufacturing


Fabricated Metal Product Manufacturing


Machinery Manufacturing


Computer and Electronic Product Manufacturing


Electrical Equipment, Appliance, and Component Manufacturing


Transportation Equipment Manufacturing


Furniture and Related Product Manufacturing


Miscellaneous Manufacturing


Wholesale Trade, Durable Goods


Wholesale Trade, Nondurable Goods


Electronic Shopping and Mail-Order Houses


Fuel Dealers


General Freight Trucking


Specialized Freight Trucking


Warehousing and Storage


Newspaper, Periodical, Book, and Directory Publishers

551114 4

Corporate, Subsidiary, and Regional Managing Offices

1 Excludes Pre-Press Services (NAICS 323122)
2 Includes only captive warehouses that provide storage and shipping support to a single company. Warehouses offering their services to the general public and other businesses are excluded.  NAICS 4841 and 4842 are new industries to the 2012 CFS.  For tabulation and publication purposes, NAICS 484 is grouped with NAICS 4931.
3 In previous cycles, NAICS 51223 Music Publishers was tabulated and published in NAICS 5111.  However, for the 2012 cycle, NAICS 51223 was not sampled.
4 Includes only those establishments in NAICS 551114 with shipping activity.
Excluded industries: Establishments classified in transportation, construction and most retail and services industries are excluded. Foreign establishments are also excluded. Other industry areas that are not covered, but may have significant shipping activity, include agriculture and government. For agriculture, specifically, this means that the CFS does not cover shipments of agricultural products from the farm site to the processing centers or terminal elevators (most likely short-distance local movements), but does cover the shipments of these products from the initial processing centers or terminal elevators onward.
General exclusions: Data for government- operated establishments is excluded from the Commodity Flow Survey. These include public utilities, publically-operated bus and subway systems, public libraries, and government owned hospitals. The Commodity Flow Survey also excludes establishments or firms with no paid employees.

Shipment Coverage
The CFS captures data on shipments originating from select types of business establishments located in the 50 states and the District of Columbia. The data do not cover shipments originating from business establishments located in Puerto Rico and other U.S. possessions and territories. Shipments traversing the U.S. from a foreign location to another foreign location (e.g., from Canada to Mexico) are not included, nor are shipments from a foreign location to a U.S. location. However, imported products are included in the CFS at the point that they leave the importer’s initial domestic location for shipment to another location. Shipments that are shipped through a foreign territory with both the origin and destination in the U.S. are included in the CFS data. The mileages calculated for these shipments exclude the international segments (e.g., shipments from New York to Michigan through Canada do not include any mileages for Canada). Export shipments are included, with the domestic destination defined as the U.S. port, airport, or border crossing of exit from U.S. See the "Mileage Calculation" section for additional detail on how mileage estimates were developed.

Sample Design

The sample for the 2012 Commodity Flow Survey (CFS) was selected using a stratified three-stage design in which the first-stage sampling units were establishments, the second-stage sampling units were groups of four 1-week periods (reporting weeks) within the survey year, and the third-stage sampling units were shipments.

First Stage – Establishment Selection

Sampling frame
To create the first-stage sampling frame, a subset of establishment records (as of July 2011) was extracted from the Census Bureau’s Business Register. The Business Register is a database of all known establishments located in the United States or its territories.  An establishment is a single physical location where business transactions take place or services are performed. Establishments located in the United States, having nonzero payroll in 2010,,and classified in mining (except oil and gas extraction), manufacturing, wholesale, electronic shopping and mail order, fuel dealers, and publishing industries, as defined by the 2007 North American Industry Classification System (NAICS), were included on the sampling frame. Certain manufacturers (Prepress services) and wholesalers (manufacturers’ sales offices, agents and brokers, and certain importers) were excluded from the frame.

Auxiliary establishments (e.g.truck transportation facilities, warehouses, and central administrative offices) with shipping activity were also included on the sampling frame. Auxiliary establishments are establishments that are primarily involved in rendering support services to other establishments within the same company, instead of for the public, government, or other business firms. All other establishments included on the sampling frame are referred to as nonauxiliary establishments.

Establishments classified in forestry, fishing, utilities, construction, and all other transportation, retail, and services industries were not included on the sampling frame. Farms and government-owned entities (except government-owned liquor wholesalers) were also excluded from the sampling frame. The resulting frame comprised approximately 716,000 establishments as shown in the table below.

Trade Area

Establishments on Frame















For each establishment, sales, payroll, number of employees, a six-digit NAICS code, name and address, and a primary identifier were extracted, and a measure of size was computed. The measure of size was designed to approximate an establishment's annual total value of shipments for the year 2009.

All of the establishments included on the sampling frame had state and county geographic codes.  These codes were used to assign each establishment to one of the 83 CFS metropolitan areas (CFS Areas) defined as a state part of a Metropolitan Statistical Area (MSA) or Combined Statistical Area (CSA).. Establishments not located in these specified CFS metropolitan areas were assigned to a Rest of State (ROS) CFS Area.

Back to top

The sampling frame was stratified by geography, industry, and measure-of-size (MOS) class (with some exceptions for auxiliary establishments and hazardous materials establishments, as described below). The geography by industry cells form the primary strata for the main part of the sample.

Geographic strata were defined by a combination of the 50 states, the District of Columbia, and specific metropolitan areas (MAs) selected based on their population and importance as transportation gateways. MAs were defined using the 2009 OMB definitions.  All other MAs were collapsed with the non-MAs within the state into Rest of State (ROS) CFS Area strata. When an MA crossed state boundaries, we considered the size of each state part of the MA relative to the MAs total measure of size when determining whether or not to create strata in each state in which the MA was defined.  For example, the Chicago CSA makes up two CFS Areas: the IL part and the IN part.  The WI part of Chicago was too small to be a separate CFS area and was combined into the rest of Wisconsin.  The table below summarizes the number of CFS Areas by type.


Geographic Stratum (CFS Area) Type

Number of CFS Areas

Actual CSA or MSA (state part)


CFS Area = State (DC, RI)




ROS < Whole state


Total Number of CFS Areas


The industry strata were determined as follows. Within each of the geographic strata, 48 industry groups were defined based on the 2007 NAICS:

  • three mining (four-digit NAICS);
  • 21 manufacturing (three-digit NAICS);
  • 18 wholesale (four-digit NAICS);
  • two retail (NAICS 4541 and 45431);
  • one services (NAICS 5111) and
  • three auxiliary (combinations of NAICS 484, 4931, and 551114).

For auxiliaries that responded to the Advance Survey and were found to be shippers, 134 primary strata were created, one in each geographic stratum, combining NAICS 484, 4931, and 551114. For auxiliary establishments that did not respond to the Advance Survey, two national strata were created as follows:

  • one stratum for non-responding truck transportation establishments and warehousing and storage establishments (NAICS 484 and NAICS 4931), and
  • one stratum for non-responding corporate, subsidiary, and regional managing offices establishments (NAICS 551114).

In order to produce good estimates of shipments of hazardous materials (HAZMAT), twenty 6-digit NAICS industries with high amounts of HAZMAT shipments were identified and used to form primary strata. The 2007 CFS data were used to identify these industries and in general, these industries were chosen because:

  • they had a large (weighted) total value or total tonnage of hazardous materials, or
  • a high percentage of their (unweighted) shipments were HAZMAT shipments.

Thirteen of the 20 industries were made certainty strata and the remaining seven industries were made into primary strata defined by state and the 6-digit NAICS code.   

The table below shows the number and types of primary strata for the main, auxiliary, and HAZMAT parts of the sample. Note that we are counting the number of strata before they are further stratified by MOS size class. 


Part of the sample

Number of Primary Strata

Main part of the sample

6,030 (134 CFS areas x 45 industries)

Auxiliary part of the sample

   Responders to the Advance Survey

134 (134 CFS areas x 1 industry)

   Nonresponders to the Advance Survey

2 (2 industries)

HAZMAT part of the sample

   Certainty (take-all) strata

13 (13 6-digit NAICS codes)

   Noncertainty strata

357 (51 states (incl. DC) x 7 6-digit NAICS codes)

The total desired sample size for the first stage sample was approximately 100,000 establishments and was fixed due to budget constraints. Therefore, in addition to defining the strata, a sample size was determined for each primary stratum.   This was performed as follows: 

  • A target coefficient of variation (CV) was assigned to each primary stratum (geography by industry cell).
  • Within each primary stratum, substrata defined by MOS were developed to minimize the sample size needed to achieve the target CV.  The establishments in the largest MOS size class were taken with certainty.  For the noncertainty substrata, the sample was allocated according to the Neyman allocation.
  • Once the minimum sample sizes for each primary stratum were determined, these were added together and compared to the desired total sample size of 100,000. If the total was not close enough to 100,000, we multiplied all of the target CVs by a fixed factor and repeated the process until the total sample size was close to 100,000.
  • The establishments in the geography by industry by MOS size class substrata were selected by simple random sampling without replacement. The total sample size was 102,565 establishments of which 46,265 were selected with certainty (see the table below).

Primary Strata Type

2012 Frame

2012 Sample


Total MOS ($mil)

Total Sample

Certainty Component


MOS of Sampled Estabs ($mil)


MOS of Certainty Estabs ($mil)





























Second Stage – Reporting Week Selection

The frame for the second stage of sampling consisted of 52 weeks in 2012.. Each establishment selected into the 2012 CFS sample was systematically assigned to report for four reporting weeks – one in each quarter of the reference year.  Each of the 4 weeks was in the same relative position of the quarter. For example, an establishment might have been requested to report data for the 5th, 18th, 31st, and 44th weeks of the reference year. In this instance, each reporting week corresponds to the 5th week of each quarter. Prior to assignment of weeks to establishments, the selected sample was sorted by primary stratum (geography x industry) and measure-of-size.

Third Stage – Shipment Selection

For each of the four reporting weeks in which an establishment was asked to report, the respondent was requested to construct a sampling frame consisting of all shipments made by the establishment in the reporting week. Each respondent was asked to count or estimate the total number of shipments comprising the sampling frame and to record this number on the questionnaire. For each assigned reporting week, if an establishment made more than 40 shipments during that week, the respondent was asked to select a systematic sample of the establishment's shipments and to provide information only about the selected shipments. If an establishment made 40 or fewer shipments during that week, the respondent was asked to provide information about all of the establishment's shipments made during that week (i.e., no sampling was required).

Data Collection

Each establishment selected into the CFS sample was mailed a questionnaire for each of its four reporting weeks, that is, an establishment was sent a questionnaire once every quarter of 2012. For a given establishment, the respondent was asked to provide the following information about each of the establishment's reported shipments:

  • Shipment ID number
  • Shipment date (Month, Day)
  • Shipment value
  • Shipment weight in pounds
  • Commodity code from Standard Classification of Transported Goods (SCTG) list
  • Commodity description
  • An indication of whether the shipment was temperature controlled
  • United Nations or North America (UN/NA) number for hazardous material shipments
  • U.S. destination (city, state, zip code) – or gateway for export shipment
  • Mode(s) of transport
  • City and country of destination for exports
  • Export mode

The 2012 CFS  questionnaire included questions about use and extent of rush delivery services. Using a two-part question, respondents were asked if any of the shipments they had reported on the questionnaire were sent using “rush delivery services,” and if so, they were asked to provide the number of shipments that were sent by each of the following methods: 1) Same day/Overnight, 2) 2-3 business days, and 3) More than 3 business days.

For a shipment that included more than one commodity, the respondent was instructed to report the commodity that made up the greatest percentage of the shipment's weight.

Imputation of Shipment Value or Weight

Only two items were ever imputed in the 2012 CFS – shipment value or weight. To correct for nonresponse or an unacceptable value in either the value or weight item for a given shipment, the missing item (or the one that failed edit) was replaced by a predicted value obtained from an appropriate model. Such a shipment was considered a "recipient" if it had a valid commodity code and the other item reported (either shipment value or shipment weight) was greater than zero and had passed edit. The recipient's item that was missing or failed edit was imputed as follows. First, a "donor" shipment was randomly selected from shipments that were reported in the CFS with:

  • The same commodity code as the recipient.
  • Both value and weight items reported greater than zero and had passed edit.
  • Similar origin and value for the item reported by the recipient.

Then, the donor's value and weight data were used to calculate a ratio, which was then applied to the recipient's reported item, to impute the item that was missing or failed edit. If no donor was found, the median ratio for all shipments reported in the survey with the same commodity code as the recipient - and with both value and weight items reported greater than zero - was applied to the recipient's reported item. For either the value or weight item, approximately three percent of the shipment records used for the calculation of estimates had imputed data.


Estimated totals (e.g., value of shipments, tons, ton-miles) were produced as the sum of weighted shipment data (reported or imputed). Percent change and percent-of-total estimates were derived using the appropriate estimated totals. Estimates of average miles per shipment were computed by dividing an estimate of the total miles traveled by the estimated number of shipments.

Each shipment had associated with it a single tabulation weight, which was used in computing all estimates to which the shipment contributes. The tabulation weight was a product of seven different component weights. A description of each component weight follows.

CFS respondents provided data for a sample of shipments made by their respective establishments in the survey year. For each establishment, an estimate of that establishment's total value of shipments was produced for the entire survey year. To do this, four different weights were used - the shipment weight, the shipment nonresponse weight, the quarter weight, and the quarter nonresponse weight. Three additional weights were then applied to produce estimates representative of the entire universe - the establishment-level adjustment weight, the establishment (or sample) weight, and the industry-level adjustment weight.

Like establishments, shipments were identified as either certainty or noncertainty (see the Nonsampling Error section below). For noncertainty shipments, the shipment weight was defined as the ratio of the reported total number of shipments made by an establishment in a reporting week to the number of sampled shipments for the same week. This weight used data from the sampled shipments to represent all the establishment's shipments made in the reporting week. However, a respondent may have failed to provide sufficient information about a particular sampled shipment. For example, a respondent may not have been able to provide value, weight, or a destination for one of the sampled shipments. If this data item could not be imputed, then this shipment did not contribute to tabulations and was deemed unusable. (A usable shipment is one that has valid entries for value, weight, and origin and destination ZIP Codes.) To account for these unusable shipments, a shipment nonresponse weight was applied. For noncertainty shipments from a particular establishment's reporting week, the weight was equal to the ratio of the number of sampled shipments for the reporting week to the number of usable shipments for the same week. The shipment weight for certainty shipments from a particular establishment's reporting week was equal to one.

The quarter weight inflated an establishment's estimate for a particular reporting week to an estimate for the corresponding quarter. For noncertainty shipments, the quarter weight was equal to 13. The quarter weight for most certainty shipments is also equal to 13. However, if a respondent was able to provide information about all large (or certainty) shipments made in the quarter containing the reporting week, then the quarter weight for each of these shipments was one. For each establishment, the quarterly estimates were added to produce an estimate of the establishment's value of shipments for the entire survey year. Whenever an establishment did not provide the Census Bureau with a response for each of its four reporting weeks, a quarter nonresponse weight was computed. The quarter nonresponse weight for a particular establishment was defined as the ratio of the number of quarters for which the establishment was in business in the survey year to the total number of quarters (reporting weeks), for which usable shipment data was received from the establishment.

Using these four component weights, an estimate of each establishment's value of shipments was computed for the entire survey year. This estimate was then multiplied by a factor that adjusts the estimate using value of shipments and sales data obtained from other surveys and censuses conducted by the Census Bureau. This weight, the establishment-level adjustment weight, attempted to correct for any sampling or nonsampling errors that occurred during the sampling of shipments by the respondent.

The adjusted value of shipments estimate for an establishment was then weighted by the establishment (or sample) weight. This weight was equal to the reciprocal of the establishment's probability of being selected into the first stage sample.

A final adjustment weight, the industry-level adjustment weight, used information from other 2012 surveys and censuses conducted by the Census Bureau to account for:

  • establishments which did not respond to the survey or from which we did not receive any usable shipment data and
  • changes in the universe of establishments between the time the first-stage sampling frame was constructed (2011) and the year in which the data were collected (2012).

For the preliminary estimates, these industry-level adjustments were made by state at the three-digit (Manufacturing) or four-digit (all other industries) NAICS levels.  There were approximately 45 separate industry adjustment factors computed.

Reliability of the Estimates

The estimates presented by the 2012 CFS may differ from the actual, unknown population values. Statisticians define this difference as the total error of the estimate. When describing the accuracy of survey results, it is convenient to discuss total error as the sum of sampling error and nonsampling error. Sampling error is the average difference between the estimate and the result that would be obtained from a complete enumeration of the sampling frame conducted under the same survey conditions. Nonsampling error encompasses all other factors that contribute to the total error of a sample survey estimate.

The sampling error of the estimates in this publication can be estimated from the selected sample because the sample was selected using probability sampling. Common measures related to sampling error are the sampling variance, the standard error, and the coefficient of variation (CV). The sampling variance is the squared difference, averaged over all possible samples of the same size and design, between the estimator and its average value. The standard error is the square root of the sampling variance. The CV expresses the standard error as a percentage of the estimate to which it refers.

Nonsampling errors are difficult to measure and can be introduced through inadequacies in the questionnaire, nonresponse, inaccurate reporting by respondents, errors in the application of survey procedures, incorrect recording of answers, and errors in data entry and processing. In conducting the 2012 CFS, every effort was made to minimize the effect of nonsampling errors on the estimates. Data users should take into account both the measures of sampling error and the potential effects of nonsampling error when using these estimates.

More detailed descriptions of sampling and nonsampling errors for the 2012 CFS are provided in the following sections.

Sampling Error

Because the estimates are based on a sample, exact agreement with results that would be obtained from a complete enumeration of all shipments made in 2012 from all establishments included on the sampling frame using the same enumeration procedures is not expected. However, because probability sampling was used at each stage of selection, it is possible to estimate the sampling variability of the survey estimates. For CFS estimates, sampling variability arises from each of the three stages of sampling.

The particular sample used in this survey is one of a large number of samples of the same size that could have been selected using the same design. If all possible samples had been surveyed under the same conditions, an estimate of a population parameter of interest could have been obtained from each sample. These samples give rise to a distribution of estimates for the population parameter. A statistical measure of the variability among these estimates is the standard error, which can be approximated from any one sample. The standard error is defined as the square root of the variance. The coefficient of variation (CV, or relative standard error) of an estimator is the standard error of the estimator divided by the estimator. For the 2012 CFS, the coefficient of variation also incorporates the effect of the noise infusion disclosure avoidance method. Note that measures of sampling variability, such as the standard error and coefficient of variation, are estimated from the sample and are also subject to sampling variability, and technically they should have been referred to as estimated standard error and estimated coefficient of variation. However, for the sake of brevity, we have omitted this detail. It is important to note that the standard error only measures sampling variability. It does not measure systematic biases of the sample. Individuals using estimates contained in this report are advised to incorporate this information into their analyses, as sampling error could affect the conclusions drawn from these estimates.

An estimate from a particular sample and the standard error associated with the estimate can be used to construct a confidence interval. A confidence interval is a range about a given estimator that has a specified probability of containing the result of a complete enumeration of the sampling frame conducted under the same survey conditions. Associated with each interval is a percentage of confidence, which is interpreted as follows. If, for each possible sample, an estimate of a population parameter and its approximate standard error were obtained, then:

  1. For approximately 90 percent of the possible samples, the interval from 1.645 standard errors below to 1.645 standard errors above the estimate would include the result as obtained from a complete enumeration of the sampling frame conducted under the same survey conditions.
  2. For approximately 95 percent of the possible samples, the interval from 1.96 standard errors below to 1.96 standard errors above the estimate would include the result as obtained from a complete enumeration of the sampling frame conducted under the same survey conditions.

To illustrate the computation of a confidence interval for an estimate of total value of shipments, assume that an estimate of total value is $10,750 million and the coefficient of variation for this estimate is 1.8 percent, or 0.018. First obtain the standard error of the estimate by multiplying the value of shipments estimate by its coefficient of variation. For this example, multiply $10,750 million by 0.018. This yields a standard error of $193.5 million. The upper and lower bounds of the 90-percent confidence interval are computed as $10,750 million plus or minus 1.645 times $193.5 million. Consequently, the 90-percent confidence interval is $10,432 million to $11,068 million. If corresponding confidence intervals were constructed for all possible samples of the same size and design, approximately 9 out of 10 (90 percent) of these intervals would contain the result obtained from a complete enumeration.

Nonsampling Error

Nonsampling error encompasses all other factors that contribute to the total error of a sample survey estimate and may also occur in censuses. It is often helpful to think of nonsampling error as arising from deficiencies or mistakes in the survey process. In the CFS, nonsampling error can be attributed to many sources:

  • inability to obtain information about all units in the sample,
  • response errors,
  • differences in the interpretation of the questions,
  • mistakes in coding or keying the data obtained, and
  • other errors of collection, response, coverage, and processing

Although no direct measurement of the potential biases due to nonsampling error has been obtained, precautionary steps were taken in all phases of the collection, processing, and tabulation of the data in an effort to minimize their influence. Individuals using estimates in this report should incorporate this information into their analyses, as nonsampling error could affect the conclusions drawn from these estimates.

A potential source of bias in the estimates is nonresponse. Nonresponse is defined as the inability to obtain all the intended measurements or responses from all units in the sample. Four levels of nonresponse can occur in the CFS:

  • item,
  • shipment,
  • quarter (reporting week), and
  • establishment.

Item nonresponse occurs either when a question is unanswered or the response to the question fails computer or analyst edits. Nonresponse to the shipment value or weight items is corrected by imputation, which is the procedure by which a missing value is replaced by a predicted value obtained from an appropriate model.

Shipment, quarter, and establishment nonresponse are used to describe the inability to obtain any of the substantive measurements about a sampled shipment, quarter, or establishment, respectively. Shipment and quarter nonresponse are corrected by reweighting. Reweighting allocates characteristics to the nonrespondents in proportion to the characteristics observed for the respondents. The amount of bias introduced by this nonresponse adjustment procedure depends on the extent to which the nonrespondents differ, characteristically, from the respondents. Establishment nonresponse is corrected during the estimation procedure by the industry-level adjustment weight. In most cases of establishment nonresponse, none of the four questionnaires have been returned to the Census Bureau, after several attempts to elicit a response. Approximately 56 percent of the establishments provided at least one quarter of data that contributed to these tables.

Some possible sources of bias that are attributed to respondent-conducted sampling include:

  • misunderstanding the definition of a shipment,
  • constructing an incomplete frame of shipments from which to sample,
  • ordering the shipment sampling frame by selected shipment characteristics, and
  • selecting shipment records by a method other than the one specified in the questionnaire's instructions.

The respondents who had reported a shipment with untypically large value or weight when compared to the rest of their reported shipments were often contacted for verification. In such cases, if we were able to collect information on all of the of the large shipments a respondent had made either for a particular reporting week or for the entire quarter, then we identified those large shipments as certainty shipments.

Mileage Calculation

The CFS does not ask respondents to report the distance traveled for each shipment.  Therefore, shipment mileages were calculated using GeoMiler, a routing tool developed by BTS specifically for CFS mileage calculations.  This software tool used current Geographic Information System (GIS) technology and spatial multimodal network databases.  It integrated map-visualization features with route solvers to handle many alternative multimodal combinations.  This tool used algorithms that found the “best path”, which is the quickest path and not necessarily the shortest path, over spatial representations of the U.S. highway, railway, waterway, and airway networks. For waterborne export shipments, GeoMiler used a waterborne commerce database from the U.S. Army Corps of Engineers to route freight originating in the U.S via the deep sea (ocean). For airborne export shipments, GeoMiler used a newly developed air export network from the BTS Office of Airline Information (OAI)

For a domestic shipment, the mileage was calculated between the centroid (center of the geographic area) of the U.S. origin ZIP Code and the centroid of the destination ZIP Code. For shipments where the origin and destination were within the same ZIP code (Intra-ZIP shipments), the square root of the total ZIP code area in square miles was used as an estimate for the distance shipped.

For multimodal shipments (shipments involving more than one mode, such as truck-rail shipments), spatial joins (intermodal transfer links) were used to connect the individual modal networks together for routing purposes. An intermodal terminals database and a number of terminal transfer models were developed at BTS to identify likely transfer points.  An algorithm was used to find the minimum impedance path between a shipment’s origin ZIP Code to the transfer point and then from the transfer point to the destination ZIP Code. The cumulative length of the spatial joins, plus links on the path, provided the estimated distances used in CFS mileage calculations.

The mileage for an export shipment was calculated between the centroid of a U.S. origin ZIP Code and the border crossing on the path of minimum impedance to the foreign destination country (foreign city in the case of Canada and Mexico). For all exports, a POE (seaport, airport, or border crossing) was found.  Only the portion of mileage measured within U.S. borders was included as domestic mileage in the CFS estimates.

Methodological Changes to Mileage Calculation for the 2012 CFS

BTS continues to seek improvements to the quality of the information produced from its flagship vehicle for data collection, the CFS. A critical measurement calculated from CFS data is the mileage traveled by each shipment. This measurement is used to calculate the ton-miles, a statistic unique to this survey.

With a valid origin and destination zip code, GeoMiler will calculate the distance traveled (in miles) by mode for each shipment reported in the CFS.

The following types of methodological changes to mileage processing were incorporated in 2012:

  • A shipment with a respondent-provided mode of Parcel must weigh 150 pounds or less; in addition, a shipment with a respondent-provided mode of Air was not given a weight restriction;
  • A mode of transportation was imputed whenever a respondent provided a mode of Other, or Unknown, or otherwise failed to provide a modal response (missing mode) for a shipment;
  • Private truck is considered a "short-haul" mode; hence Private truck shipments were not routed more than 500 miles during shipment routing.

Air versus Parcel Mode
According to the 2007 CFS Instruction Guide, an air shipment was defined as a shipment that weighed 100 pounds or more. During mileage processing for the 2007 CFS, an Air shipment was manually converted to Parcel if the weight of the shipment was less than 100 pounds.

However, airlines do not necessarily have minimum weight restrictions when transporting cargo. Hence, for the 2012 CFS, the definition of an Air shipment was changed. As a result, an Air shipment was acceptable as provided by the respondent, regardless of weight.

Furthermore, for the 2012 CFS, Parcel shipments conformed to the definition used by the parcel industry that a parcel is a shipment of 150 pounds or less. For shipments submitted by the respondent with mode of Parcel and a weight above 150 pounds, GeoMiler changed the mode to For-Hire Truck during mileage processing.

Routing a Shipment When Mode Is Other, Unknown, or Missing
On the survey form, respondents were given the following choices for mode of transport: Air, Highway (Private truck or For-hire truck), Rail, Waterway (Inland water or Deep sea), Parcel, Pipeline, Other Mode (meaning none of the above), or Unknown.

During the 2007 CFS mileage processing, 2.4% of shipments had a respondent-provided mode of Unknown or Other, and an additional 2.1% had no reported mode at all. Since all shipments must be properly routed to calculate a distance traveled, imputations were made.

For 2012 CFS mileage processing, if the shipment weighed less than 80,000 pounds, it was routed via Highway mode as For-hire truck; if the shipment weighed 80,000 pounds or more, it was routed via Rail mode.

Private Truck versus For-Hire Truck
Shipments via Private truck are generally "short-haul" in nature. Because of the number of shipments exceeding this norm in the 2007 CFS, Census Bureau analysts researched the Private truck shipments at or above 500 miles. And in almost all cases, the mode should have been reported as For-hire truck instead of Private truck.

Consequentially, for 2012 CFS GeoMiler mileage processing, Private truck was converted to For-hire truck if the shipment mileage was equal to or greater than 500 miles, regardless of the commodity being transported. The 2012 CFS preliminary data shows a decrease from 2007 in average miles per shipment for Private truck with an average of 46 miles per shipment.