Workforce Estimation Model (WEMo) | Lightcast Knowledge Base

Overview

Lightcast estimates the total size of a workforce by region and occupation in our Global dataset, which is available in the Talent Analyst software and API. This is done using our proprietary Workforce Estimation Model (WEMo).

The model is powered by various government data sources and incorporates labor market patterns from job postings data to estimate talent pools at a granular level.

There are inherent challenges to collecting and organizing labor market data on a global scale. These challenges include national differences in:

language
data availability
update cadencies
job categories & definitions
region categories & definitions

The WEMo incorporates methodologies and taxonomies to handle these challenges in order to provide accurate and comparable workforce estimates across regions. A full list of countries that the WEMo supports can be found in the Available Countries in Global table.

Data Sources

The model uses government data as its primary source, and it incorporates patterns from job postings data as a secondary source.

Lightcast collects workforce data from local government agencies and international organizations in order to get the most reliable workforce data available, regardless of their original format. This is evaluated for its breadth of coverage (does it capture the whole economy?), its recency in collection (how recent is the data?) and its collection methodology (sample size, etc). To learn about the specific data source(s) that we collect for each country, see the full breakdown in our country methodology table.

Lightcast also collects job postings from a variety of global sources daily. We capture as many primary sources as possible, such as applicant tracking systems and direct postings by employers on job boards, which we augment with high quality secondary sources. After being collected, the job postings are translated and classified so they can easily be analyzed in aggregation.

Finally, we bring the data together and classify it according to different taxonomies, such as the International Standard Classification of Occupations (ISCO), the Lightcast Occupation Taxonomy (LOT), and the Lightcast Administrative Areas (LAA).

Methodology

Below is the method Lightcast uses to develop workforce estimates. This does not include the methodology for creating the Lightcast Occupation Taxonomy (LOT) or Lightcast Administrative Areas (LAA). While these two global taxonomies are essential for the WEMo, they are also used independently in other contexts.

First, Lightcast collects the most reliable workforce data from local government agencies and international organizations. We then develop in-house crosswalks to bridge the gap between the local job taxonomies and the ISCO taxonomy. This allows us to standardize job categories and definitions across countries, ensuring that the data is comparable on an international scale.

While the ISCO taxonomy offers several advantages, it is less granular than other classification systems. For instance, it includes 400+ occupational categories, whereas the US SOC taxonomy covers 800+ and LOT extends to 1,900+ categories.

And so, after Lightcast establishes a crosswalk between the local taxonomy and the broader ISCO taxonomy, we then model ISCO down to the more granular LOT taxonomy. This is done by applying distribution patterns derived from job posting data specific to each country. Let's break that down in more detail.

Lightcast assigns each granular LOT Specialized Occupation to a broader 4-digit ISCO occupation. Using job postings from the past three years, classified by LOT Specialized Occupations and specific to each region, we then calculate ratio distributions for every LOT Specialized Occupation relative to its corresponding 4-digit ISCO occupation in each region.

Finally, we apply these ratios to the workforce data to estimate the workforce size for each LOT job category in every region. The output is presented as a range (low, middle, high) to reflect the confidence in the estimate. For example, the number of Java Developers in Germany is estimated to be at least 27,018 (low), around 29,947 (middle), and up to 32,876 (high).

In summary, the model utilizes government workforce data and ratio distributions based on job postings to produce workforce estimates by region and occupation.

Confidence Levels

The WEMo output is presented as a range (low, middle, high) to reflect the confidence in the data estimate. The confidence is determined by the quality of the data that feeds into the model. Factors include the amount of data available, the recency of the data available, and any major market variances.

Lightcast measures the distance within the range and buckets it into five confidence levels for quick assessment and comparability. A smaller range means higher confidence, while a larger range means lower confidence. The exact formula used is distance = (high - mid) / mid.

Level	Distance	Confidence Term
5	`<.061`	"extreme confidence"
4	between `.061` & `.137`	"high confidence"
3	between `.136` & `.226`	"moderate confidence"
2	between `.225` & `.363`	"marginal confidence"
1	`>.363`	"minimal confidence"

The model is not designed to output an estimate at all costs. If we are not confident in the data availability in a region, we do not produce any estimate. With this in mind, there are some countries that will not render WEMo results. While the majority of countries in our Global dataset have workforce estimates, some lack reliable data sources for Lightcast to incorporate into the model. A full list of available countries that are compatible with WEMo can be found in the Available Countries in Global table.

Analyst Platform Changelog

Lightcast Occupations Taxonomy (LOT) Classification Methodology

Labour Insight™ (SGP) - SSOC and Lightcast Occupation Taxonomies

Global Data Release Notes

Lightcast Similarity Model