Skip to main content
Profiles Methodology
Updated this week

Overview

Lightcast Profile Analytics is built from individual profiles of over ~331 million workers in the world wide. Typical fields available are city/state/nation of residence, job history, education history, and skills. Many profiles also contain names, phone numbers, and email addresses, but these are not made available in bulk to Lightcast users. Profiles data can provide unprecedented levels of detail for labor market analytics, especially with regard to worker skill-sets, career paths, company-level human capital, school-level alumni outcomes, and more.

Data Sources

For proprietary and confidentiality/non-disclosure reasons, Lightcast cannot provide a detailed list of data sources. However, via our small number of main sources, Lightcast has aggregate access to a multitude of original sources, some of which provide thousands of profiles, and others millions. Our primary sources aggregate multiple other sources, and don’t always break down their own sources. Because of this, and for confidentiality reasons, we do not provide an exact source count. The total count of our unique profiles ~331 million world wide is a much better measure of coverage than a count of sources.

How Lightcast Processes Profiles

Standardization

First, data from all sources is standardized to a common format. This is necessary to simplify processing and enable matching across sources. Sources which do not support certain fields common to other sources are marked as having missing values.

Profile Matching

We use several strongly unique and personally identifiable fields or combinations of fields such as name/email, name/phone, online URL(s), etc. to match profiles with each other. Matched profiles are collected into groups representing a single person. Our matching method prioritizes accurate matches over finding every single duplicate.

Field Merging

Within each profile group that represents a single person’s “master” profile, we then need to merge the data for each field from multiple source profiles. We use customized similarity functions for each field to determine if a location, job, or degree is a duplicate of one already seen, then merge them together into a single sequence per master profile.

Normalization

Finally, we classify various profile field values such as company and school into a smaller number of fixed categories to enable meaningful aggregation and analysis. We call this process “normalization.”An example of normalization would be to normalize free-form variations of “St. Louis, Missouri” as found in different profiles to “St. Louis. MO”. One person might list their location as “Saint Louis Missouri”; another might list “ST Louis MO”; and a third might list “St. Louis Missouri”. Normalization corrects all variations of this city name to “St. Louis, MO”. Without the normalization step, aggregate analysis of profiles would be impossible–searching for profiles in “St. Louis Missouri” would automatically exclude profiles where the person wrote their location as “Saint Louis Missouri”.

Geographic Location

Lightcast uses Google Geocode API to standardize locations to city, state, and county.

Job History

  1. Company: Lightcast normalizes the company information provided in the profile to a company in our extensive database of company names. This information also provides the industry.

  2. Lightcast Job Title: Lightcast standardizes the job title on the profile with our Lightcast Titles taxonomy.

  3. Lightcast Occupation or O*NET Code: Lightcast assigns an occupation based on the job title and job description text.

Where a profile has multiple job histories, for instance, a person has worked for multiple companies over a period of 5 years, we normalize and display each job history in our data share products.

For Profile Analytics, customers will only see the most recent job that has been held by that profile since 2018.

Customers using US or Canada profile data, have the ability to adjust for when the profile was last updated via the Profile Recency filter.

Education History

  1. School: Lightcast maintains a proprietary database of over 20,000 postsecondary institutions (including distinct campuses and some sub-departments/colleges/schools) and an algorithm that normalizes the school name provided in the profile.

  2. Degree Level: Lightcast maintains a classifier that converts freeform text to one of four standard education levels (associate’s, bachelor’s, master’s, doctorate/professional).

  3. Field of Degree (CIP code): Lightcast maintains a classifier that converts freeform text naming a college major to a CIP 2010 code.

Skills

Lightcast maintains a set of over 33,000 recognized skills and a context-aware extraction tool that identifies these skills in profile text.

Filtering for “Usable” Profiles:

After normalization, for we apply some filters to profiles before displaying them in Lightcast products. Profiles are filtered out if they meet one or more of these conditions:

  • We do not know the person’s nation of residence

  • The profile does not have a job history, education history, or skills

  • The profile is older then 2018-01-01 which removes outdated profiles

  • The profile language is other then English or Spanish

For Alumni Outcomes and GoRecruit, Lightcast matches institutional data to this version of the profile database.

Did this answer your question?