All Collections
Data Methodology
Lightcast Data Collection and Processing Methodology
Lightcast Data Collection and Processing Methodology
Updated over a week ago

The methodology used to obtain job advertisements from online job boards and company websites is based on Lightcast advanced β€œspidering” technology. Once Lightcast identifies an online site as a valid source of employment opportunities, a dedicated spider is programmed, tested, and activated. The spider visits the site regularly and pulls job information for all jobs posted; the information is then stored in a database. The sites with the newest jobs or with the highest frequency of change in postings are visited most frequently. Lightcast spiders more than 65,000 sites worldwide.

Once postings are collected, Lightcast technologies parse, extract, and code dozens of data elements including the following: job title (which is used to map to an occupation); employer (which is used to assign an industry code); detailed data about the specific skills, skill clusters; educational credentials; certifications; experience levels; and work activities required for a specific job, as well as data about salary, number of openings, and job type. The high-level of detail enables users to look beyond summary statistics to discover specific skills in demand and skills that job seekers can identify and acquire if needed.

Lightcast places particular care in coding occupation and industry, the two fields which are of greatest interest to policymakers. For example, we have developed a curated set of tens of thousands of business rules to appropriately assign a job into the best fit government occupation and the best fit Lightcast occupation.Lightcast takes a robust approach to data collection. Sources include direct employer sites as well as job boards, aggregators, government, and free sites. Vast coverage ensures data provisioned are representative of the labor market and the data is in real-time with new advertisements identified, processed, and added to the database each day. Additionally, Lightcast has strong mechanisms in place to deduplicate job advertisements. Roughly 80% of all postings collected are discarded as duplications, yielding new unique opportunities rather than the broad-brush posting activity undertaken by recruiting and staffing agencies.

The Lightcast aggregation process delivers significant benefits to our users with more detailed, granular, and searchable data delivering actionable labor market intelligence. Real-time labor data can serve as a powerful complement to traditional survey-based instruments.

Did this answer your question?