Skip to main content
All CollectionsFAQsData FAQs
Changes to Global Deduplication 6/17
Changes to Global Deduplication 6/17
Updated over a week ago

What is changing?

We are publishing a new global job postings dataset with improved deduplication. The improvement in deduplication is technical in nature and will result in better matching when characters in company names or titles are cased differently between two or more postings, including when they include diacritic characters. We are not changing the core deduplication methodology.

The reduction in job postings volume due to improved deduplication will be reflected in Analyst, our Global Job Postings API and the global job postings table in Snowflake.

Which countries are affected?

We are expecting a decrease in overall volume due to the improvement in deduplication as follows:

Nation           Decrease
Brazil 16.2%
Luxembourg 4.8%
France 3.7%
Germany 2.8%
Portugal 2.7%
Switzerland 2.1%
Austria 1.6%
Netherlands 0.5%
India 0.4%
Ireland 0.4%
South Africa 0.3%
Spain 0.3%
Mexico 0.2%
Romania 0.1%
Belgium 0.1%
Hong Kong 0.2%
Poland 0.1%
Italy 0.1%
Sweden 0.1%
Argentina 0.1%
Denmark 0.1%
Czech Republic 0.1%

Note: These percentages are subject to minor variations as new data was gathered between this analysis and publication date.

When will this change go live?

This change is slated to go live on June 17th.

Why are some countries more affected than others?

We see the biggest improvements in countries that use a lot of diacritic characters and where there was more variation in capitalization/case across sources that advertise the same role.

In addition to this change we continually complete routine curation work on our dataset, detailed changes can be found Here.
โ€‹
For additional information or questions on this change, please contact [email protected].

Did this answer your question?