Overview
Lightcast offers employment data by industry and occupation at the ZIP code level. We begin with Lightcast’s final county-level industry data. ZIP-level industry data is created by disaggregating industry county-level data down to the ZIP level with the help of several outside sources. ZIP-level occupation data is created by applying staffing patterns to ZIP-level industry data.ZIP data should be used while keeping certain cautions in mind. We outline these after describing the process by which Lightcast's ZIP-level estimates are created.
Creation of ZIP-Level Data
The first section describes the creation of industry estimates, and the second section outlines occupation estimates.
Modeling Industry Data from County to ZIP
The backbone of ZIP-level data is Lightcast county-level data, which is built using the BLS’s Quarterly Census of Employment and Wages (QCEW) dataset, the most complete and trustworthy source of employment data available in the United States. We use these numbers as the foundation for ZIP-level data, ensuring that employment at the ZIP level exactly matches employment at the county level.
To model the industry county data down to the ZIP level, we use DBUSA business listings to create percentages of employment among ZIPs and industries within a county. For instance, if Lightcast county data shows that a 3-ZIP county has employment of 200 in industry x, and that DBUSA shows employment ratios of 57%, 43%, and 0% for that industry in the ZIPs in that county, we will assign 114 jobs, 85 jobs, and 0 jobs for that industry to each ZIP in the county, respectively.
If Lightcast’s county-level data contains employment for an industry, but DBUSA shows no employment for the industry, we move up to the parent 5-digit NAICS and check DBUSA again. This happens up to the 2-digit NAICS level, as necessary to find data in DBUSA.
We use USPS’s DelStat dataset to create default fallback proportions for each county in case no DBUSA data is available for that county-industry combination. DelStat provides business address counts by ZIP. We create a default proportion for each county by counting the number of business addresses in each ZIP within the county. This means we create a unique business address percentage mapping of each county, showing what percent of the county’s businesses are in each ZIP. If the initial method of using DBUSA to assign employment for an industry to ZIPs doesn’t work, we fall back to the county’s default percentage map to distribute employment for that industry. The fallback method is only necessary in 0.5% of cases.
Modeling ZIP Occupation Data from ZIP Industry Data
Lightcast ZIP occupation data is created in the same way as Lightcast county occupation data—we use regionalized staffing patterns created from the BLS’s Occupational Employment Statistics (OES) dataset. OES provides a national-level staffing pattern, which we regionalize using regional industry and occupation data for each OES substate region. These staffing patterns are then applied to Lightcast county-level industry data, producing county-level occupation data, and are also applied to ZIP-level industry data, producing ZIP-level occupation data.
Cautions About ZIP-Level Data
Users should keep several things in mind when using Lightcast data. First, ZIP codes are not official geographically bounded or distinguished areas, unlike states and counties. Second, Lightcast uses the Post Office's monthly-updated ZIP code definitions. Third, no source of complete ZIP-level data exists. Finally, Lightcast ZIP-level data is not a time series.
ZIPs Are Not Geographies
ZIP codes are collections of addresses used by the Post Office to efficiently deliver mail. Many ZIPs in the United States are points. For instance, a Post Office building, a large apartment complex, or a business may have its own ZIP code. The U.S. even has one floating ZIP code.Many institutions have their own flavor of ZIP codes (Census Bureau, Dept. of Housing & Urban Development) with different updating schedules and therefore very different definitions at any given time about what constitutes a ZIP code. Because there is no official definition, ZIP code data rarely matches between any two given sources. Differences often come down to the underlying ZIP definitions used, as well as what source was used to render those ZIP codes visually in the case of GIS or mapping software.
USPS Monthly ZIP Update Files
The Post Office's ZIP definitions change monthly as carrier routes morph. Lightcast defines ZIP codes using the latest available definition available from the Post Office at the time of each quarterly datarun. Many other sources of ZIP code data use ZIP code definitions from sources other than the Post Office, or their definitions are out of sync with the version currently used in Lightcast data at any given time.
Complete ZIP Data Does Not Exist
Because ZIP codes are not official geographies (unlike Census Tracts and Blocks), complete data for ZIP codes does not exist in the United States. The Census' LEHD LODES dataset is available at the ZIP level, but only provides data for 2-digit NAICS. The Census's ZIP Code Business Patterns (ZBP) dataset is the closest thing to a complete set, but even it is fairly incomplete.
ZIP Data Not a Time Series
Finally, Lightcast ZIP code data is not a time series. We take a time series of county-level data and apply breakout percentages based on current DBUSA to each year in the county-level time series.This results in ZIP-level employment data across all years being changed each year when DBUSA is updated. Therefore, each new year's ZIP-level data is a snapshot rather than a time series and should be treated as such. Since DBUSA is a volatile dataset, volatility in employment between dataruns in which DBUSA is updated is expected.