Skip to main content

Data Collection and Processing Methodology

Updated this week

The methodology used to obtain job advertisements from publicly available online job boards and company websites is based on Lightcast's advanced scraping technology. Once Lightcast identifies an online site as a valid source of employment opportunities, a dedicated spider is programmed, tested, and activated. The spider visits the site regularly and pulls job information for all jobs posted; the information is then stored in a database. The sites with the newest jobs or with the highest frequency of change in postings are visited most frequently.

Once postings are collected, Lightcast technologies parse, extract, and code dozens of data elements including the following: job title (which is used to map to an occupation); employer (which is used to assign an industry code); detailed data about the specific skills, skill clusters; educational credentials; certifications; experience levels; and work activities required for a specific job, as well as data about salary, number of openings, and job type. The high-level of detail enables users to look beyond summary statistics to discover specific skills in demand and skills that job seekers can identify and acquire if needed.

Lightcast takes a robust approach to data collection. Sources include direct employer sites as well as job boards, aggregators, government, and free sites. Vast coverage ensures data provisioned are representative of the labour market and the data is in real-time with new advertisements identified, processed, and added to the database each day. Additionally, Lightcast has strong mechanisms in place to deduplicate job advertisements.

The Lightcast aggregation process delivers significant benefits to our users with more detailed, granular, and searchable data delivering actionable labour market intelligence. Real-time labour data can serve as a powerful complement to traditional survey-based instruments.

Did this answer your question?