Fundamentals of Lightcast Data in the US

Lightcast gathers and integrates economic, labor market, demographic, education, profile, and job posting data from dozens of government and private-sector sources, creating a comprehensive and current dataset that includes both published data and detailed estimates with full United States coverage. Industry, occupation, education, demographic, job postings, and profiles data are available at national, state, metropolitan area, and county levels. ZIP code estimates are available for employment, earnings, job change, and demographics data. Read the complete list of Lightcast data sources.

Frequency and Recency of Lightcast Data

Lightcast’s core LMI data (industry, occupation, education, demographics) is updated with the Lightcast quarterly datarun. Each datarun contains the latest data from each of Lightcast’s sources. The datarun is released early in the quarter (e.g. the Q2 datarun is typically released in April). Read more about release notes for Lightcast’s dataruns. Release notes also contain information on the age of the major sources that go into Lightcast data.

Job postings are scraped, deduplicated, and added to the system every day. For reports based on monthly timeframes, the latest month’s postings are added a few days into the following month (e.g. September postings are available a few days into October).

New profiles and updates from Lightcast’s sources for profiles are incorporated quarterly. Read more about the changelog for profiles data.

Postings and profile data is updated every four weeks to account for improvements in tagging to elements, including occupation, company, industry, skills, and more.

Job Postings

Lightcast job postings data is gathered by scraping over 220,000 websites worldwide, including company career sites, national and local job boards, and job posting aggregators.

Lightcast applies a unique two-step approach to deduplication that results in up to 80% of all jobs we collect being deduplicated.

The first step: On a source-level basis, we use intelligence contained within the scraping spiders to identify a new advertisement for that source. The spiders refrain from collecting advertisements that have previously been aggregated.

The second step: As the same new advertisement can be found across multiple sources. We use normalized fields including job title, company, and location to check if these fields have been used in new advertisements found in another source. This is checked across 60 days of data to identify duplicates.

To illustrate ‘step two’, here is an example: if there is a job for a Marketing Specialist at Google posted for the first time on March 1st, Lightcast considers this as the ‘original posting’ then for the next 60 days Lightcast considers any advertisements found as duplicates. In theory, if Google posts the same ad every day for the entire year on different sources Lightcast will count it 6 times.

Each job posting is further enriched with value-add processes including

Job title and company standardization
Skill extraction and tagging
SOC and NAICS code determination and assignment
Education and experience determination

Read more detail on Lightcast’s Job Posting Analytics (JPA) process.

Profiles

This dataset contains profiles of individual people in the workforce. Each profile contains information unique to each individual, such as job title, company, skills, and education information.

Lightcast’s profile database currently contains profiles for over a hundred million distinct individuals. Lightcast profiles data is gathered from publicly available information on the web, third-party resume databases and job boards, the recruiting industry, opt-in data from employers and applicant tracking systems, sales and marketing CRM databases, and various consumer/identity databases.

As with job postings, machine learning algorithms are used to deduplicate profiles and enrich the raw data contained in each profile—job titles and company names are standardized, skills are extracted, and education information is standardized.

Read more information on Lightcast’s Profiles Methodology.

Industries

Industry data is the backbone of Lightcast’s core LMI data. Lightcast industry data is data about businesses, categorized by type—hospitals, oil refineries, grocery stores, etc. The Bureau of Labor Statistics’ Quarterly Census of Employment and Wages (QCEW) dataset provides detailed employment counts and earnings information for 95% of the employed workforce in the United States, broken out by industry. The employment counts data provided by this dataset are the gold standard of employment counts throughout Lightcast data. Where necessary, Lightcast fills in suppressed data points in QCEW using data from the Census’s County Business Patterns (CBP) dataset. Read more information on the extent of suppressions in QCEW and the importance of Lightcast’s unsuppression process for labor market data.

Lightcast uses other datasets to provide data for the remaining 5% of the employed workforce not covered by QCEW. Lightcast uses American Community Survey (ACS) data to provide job counts and earnings data for self-employed workers. Industry job counts and earnings data are available back to 2001.

Lightcast projects industry job counts data 10 years into the future. Three historical trendlines (last 5 years, last 10 years, last 15 years) are projected forward 10 years and averaged, yielding a raw projected trendline. This trendline is then adjusted slightly by taking into account the BLS’s National Industry-Occupation Employment Matrix (NIOEM) dataset, which contains national-level employment projections. Lightcast then adjusts the trendlines to state-level projections published by state LMI offices, yielding Lightcast final industry projections data. Read the full explanation of the industry projections methodology. Industry earnings data are not projected.

Occupations

Occupation data presents employment and wage information, categorized by worker type—Registered Nurses, Welders, Web Developers, etc. Occupation job counts are generated by taking industry job counts from QCEW and combining them with staffing patterns from the BLS’s Occupational Employment and Wage Statistics (OEWS) dataset. Staffing patterns are unique to industries and show the percentage breakout of each industry into its component occupations. Lightcast regionalizes OEWS staffing patterns, creating location-specific staffing patterns that take into account the region’s particular industry mix. The result is tailored staffing patterns that generate location-specific occupation employment data.

Basic occupation earnings data come from OEWS as well. Lightcast unsuppresses earnings data where necessary and models the MSA-level earnings native to OEWS down to the county level. Although OEWS is not published as a time series, Lightcast has developed one using historical OEWS data. This time series offers several benefits, including historical occupation earnings back to 2005, reduced volatility between years of published OEWS data, and the ability to use historical years of OEWS to unsuppress latest year OEWS data. Read more information on Lightcast’s occupation process and historical OES time series.

In some of its products, Lightcast also provides earnings estimates for job titles layered with skills. Traditional government LMI provides earnings data for occupations, but job titles and skills are more granular than occupations. Lightcast derives estimates for job titles and skills by combining compensation data from more granular worker profiles with occupation-level earnings data from OEWS using a special compensation model. Read more information about the compensation model documentation.

Like industry employment data, occupation employment data goes back to 2001 and is also projected 10 years into the future. Projections are generated by applying projected staffing patterns to Lightcast’s projected industry employment data. Occupation earnings data are not projected.

Education

Lightcast provides data on college enrollments and graduates, as reported in the National Center for Education Statistics’ (NCES) IPEDS dataset. This includes gender and race/ethnicity data for enrollees by school; graduates by school, CIP code, award level, gender, and race/ethnicity; and data on distance completions, as well as information on tuition and other student fees.

IPEDS publishes updates to various aspects of the data throughout the year, and Lightcast incorporates the updates as they become available. Generally new completions data is published in late summer.

For more information, read our article on the timing of IPEDS updates.

Demographics

Demographics data largely comes from the Census Bureau’s Population Estimates Program and are published by the Census down to the county level. Lightcast demographics show population breakouts by age group, gender, and race/ethnicity.

Lightcast creates estimates at the ZIP code level by using American Community Survey (ACS) data to model down to the Census Tract level, then using a tract-to-ZIP code mapping from the Department of Housing and Urban Development (HUD) to map from tracts up to ZIP codes.

Lightcast uses a cohort model to project demographics data forward 10 years.

Lightcast Data: Basic Overview