Skip to main content

Lightcast’s Canada Data Process Overview

How Lightcast collects and combines LMI data sources and dimensions for Canada

Lightcast’s Canadian dataset incorporates and harmonizes labour market data from SEPH, LFS, CBP, Census, and PSIS, providing it in a format that is easy to understand, easy to access, and easy to use. By combining these disparate datasets into one master set, the strengths of each can compensate for the weaknesses of the others. The data reaches back to 2001 and is projected forward to 2033.

The Canada Analyst tool is updated twice a year with new data from various sources, giving our users access to the most current information. Lightcast data provides information for 305 detailed industries, classified using the NAICS 2022 system; 514 detailed occupations, classified using the NOC 2021 system; and 2,120 educational programs within the CIP classification system. All of these classifications are provided for 5,161 detailed geographical areas.

Current Data Sources

Canadian Business Patterns (CBP)

  • Establishment Counts by Industry, CSD

  • Location Counts by Industry, CSD

Census and National Household Survey (2001, 2006, 2011, 2016, 2021)

  • Workplace-based: Earnings by Class of Worker, Industry, CD

  • Workplace-based: Employment by Class of Worker, Industry, CSD

  • Workplace-based: Employment by Class of Worker, Industry, Occupation, Province

Survey of Employment, Payroll, and Hours (SEPH)

  • Annual Employment by Industry, Province/Territory

  • Annual Weekly Earnings by Industry, Province/Territory

  • Monthly Employment by Industry, Province/Territory

  • Monthly Weekly Earnings by Industry, Province/Territory

Labour Force Survey (LFS)

  • Annual Employment by Occupation, Class of Worker, Economic Region

  • Annual Employment by Industry, Economic Region

  • Annual Employment/Earnings (two-year rolling averages), Occupation, Employees, Economic Region

Canadian Occupation Projection System (COPS)

  • Industry Employment Projections, Canada

  • Occupation Employment Projections, Canada

Demographics

  • Cansim 17-10-0152-01 Historic Age/Gender, CD

  • Cansim 17-10-0153-01 Historic Population Components, CD

  • Cansim 13-10-0418-01 Fertility Rates

  • Cansim 13-10-0710-01 Death Rates

Postsecondary Student Information System (PSIS)

  • Enrollments and Completions by Award Level, Program, Institution, CSD

Data Classification Systems

North American Industry Classification System (NAICS) 2022

The NAICS 2022 version is currently in use in the Lightcast dataset as this aligns with the NAICS version used by SEPH.

National Occupation Classification (NOC) 2021

The 2021 NOC version is currently in use in the Lightcast dataset as this is the version being adopted by the Census.

Lightcast Industry Data

Industry Location Counts

Industry location counts are direct from Canadian Business Patterns with no modifications.

Industry Employee Counts

There are multiple sources of employment data by industry available in Canada, but Lightcast considers SEPH to be the best source of employee counts and employee earnings by industry. Therefore, although other sources are incorporated, SEPH is considered the primary source and other figures are adjusted to it. At its most detailed, SEPH provides 4-digit NAICS by Province/Territory. Because some values in the SEPH dataset are suppressed (undisclosed by the government to protect confidentiality), Lightcast uses a proprietary process to unsuppress the data.

Supplementing SEPH

SEPH does not cover all employees in Canada, nor does it provide detail to the desired level of geographical detail. Data from the Census and CBP are combined with the SEPH data to fill in details for employees in agriculture, fishing and trapping, private household services, religious organizations, and military personnel of defence services. Additionally, these datasets are used to disaggregate SEPH data down to the census subdivision for all industries.

Employee Earnings

SEPH contains employee earnings for all industries by province and territory. Industry employee earnings are further regionalized to the CSD/CD level using Census data.

Employee Projections

To create industry employment projections, Lightcast builds three linear regressions using historic employee counts for each geography. The regressions utilize historic data 3, 5, and 10 years into the past. The average of these linear regressions is taken, and the results are damped to curb excessive growth and decline. All trends are then adjusted to the trends of higher geography levels (CSD adjusted to CD, CD to Province, Province to Nation). This trend is considered our base projection. After we create the base projection, we adjust our annual growth rate by industry to the projections produced by COPS. This completes our industry employee count process, creating CSD-level data for 2001-2033.

Industry Self-Employment Counts

Data for the self-employed is less readily available than employee data. SEPH and CBP contain no data on self-employed persons, so Lightcast gathers this data from the Labour Force Survey (LFS) and the Census. LFS is the benchmark dataset in this case, as the Census undercounts the number of self-employed by the nature of the questions it asks.[1] Lightcast only provides worker counts for the self-employed; there is no earnings data available. Lightcast projects the self-employed counts in the same way the employee data is projected, with the exception of adjustments. For self-employed data, the only adjustment that is made is to the overall projected growth rate of the economy at large. This completes Lightcast’s self-employment data process, which provides employee counts (not earnings) at the CSD level for 2001-2033.

Lightcast Occupation Data

Occupation data is generally inferior to industry data. Because industry data is more easily tied to Business Registers and to businesses, which are typically more accurate in how they classify themselves industrially, employee counts by industry are generally more accurate than employee counts by occupation. Occupation data, by nature, is usually collected from individuals and is more prone to error. For these reasons, we consider industry data to be more reliable than occupation data, and adjust occupation data accordingly.

Geographic Occupation Counts

Occupation data is a combination of two processes. The first is the establishment of fixed occupation counts at the higher geography levels. The second is the formation of staffing patterns for industries at these same geographic levels. These staffing patterns, in combination with the industrial mix at lower geography levels, then determine the occupational makeup of lower-level geographies (e.g. CSDs).

Lightcast begins with 4-digit NOC Labour Force Survey employment and earnings figures at the Economic Region geographical level. This dataset contains undisclosed values (suppressions), which Lightcast fills in using Census data as an initial estimate.

The undisclosed values for earnings are filled in using a separate process that incorporates industry earnings and occupation earnings from a higher level of geography. This process yields a full-series 4-digit NOC breakout at the economic region level. These estimates are then disaggregated to the CSD level using Census, smoothed to account for volatility present in LFS, and adjusted to SEPH totals so that occupation job counts and earnings match industry job counts and earnings.

The occupation job counts data is then projected using the same projection methodology described above in the industry employee process. After this base projection is created, its annual growth rate is adjusted by occupation to the occupation projections produced by COPS. These projections are then adjusted so that the projected occupation totals match the projected industry employment totals. The result of these processes is Lightcast occupation employment and earnings data by Economic Region.

Occupation Staffing Patterns

The second part of the occupation process creates staffing patterns for each economic region. After the staffing patterns are formed, CSD-level industry data is “staffed” into occupations at the CSD level using the staffing patterns created for the higher-level geography. Average hourly earnings at the Economic Region level by occupation are then applied to CSD-level data (earnings data by occupation is problematic below the Economic Region level). This forms Lightcast’s occupation employee dataset at the CSD level.

Occupation Self-Employment Process

The self-employment occupation process follows the employee occupation process very closely, with a few minor alterations. First, self-employment occupation data margins are established at the Province level rather than at the Economic Region level, as the data is highly suppressed at the Economic Region level. Second, the self-employment occupation data is not adjusted to COPS occupation projections. Third, staffing patterns are created at the Province level rather than at the Economic Region level. Finally, earnings figures are unavailable for self-employed workers by occupation.

Demographics Data

Lightcast provides population counts by age and gender at the Census Division level. All historic period data is published by Lightcast as delivered by StatCan. For projected years, Lightcast uses a traditional cohort model, which accounts for births, deaths, in- migration and out-migration at the Census Division level. The results of the cohort model are adjusted to provincial population projection estimates published by StatCan.

Education Data

Lightcast provides completion counts by institution and educational program. Completions data is produced by unsuppressing the PSIS enrollment and graduate datasets separately, and then combining them into one set.



[1] LFS estimates for self-employed workers in 2016 exceed Census counts of the same by 1.2 million.

Did this answer your question?