The Titles Classifier takes in raw title information as an input and classifies that data to one of over 70,000 Lightcast Titles. A raw title is a title that has been pulled directly from an outside source such as a job posting or resumé, while a Lightcast Title is a curated list of standardized titles derived from raw titles.
This Titles Classification process uses a vector-based machine learning model that can take a raw job title and find the best match to a defined Lightcast Title. Prior to vectorization, the raw titles are normalized to remove irrelevant information from the raw title. The Lightcast Titles Classifier is trained using a combination of raw job title data and the entire list of Lightcast Titles. This training set allows our model to recognize duplicate and distinct titles by learning the relationship between a title and acronyms, semantic variations, and abbreviations.
The classifier can return the top Lightcast Titles in the taxonomy with the highest similarity score to the raw title being classified.