Google has said that “up to 17% of pages on the web” are duplicates. If a website has over 100,000 pages, then for sure there will be some duplicates.
This is why Google announced in October 2018 that they will take steps to reduce such content and penalize sites with high amounts of duplicate content. The company has also been testing this feature in the past where they ranked results lower if they detected higher amounts of similarity in two or more pieces of content.
Google introduces a new ranking signal called “Page Quality”. With this new quality score, it will take into account both relevance and duplication and assign a quality based on it, so that users will see higher-quality webpages at the top.
The percentage of duplicate content that Google is talking about here is the percentage of relevancy between the webpages.
Google has released a document, saying that they are going to be looking at the “percentage of relevancy” as a key factor in how they rank content. They also included a few examples where they found that two webpages were relevant to each other and thus Google ranked them together on the same page.
Google has not revealed what percentage of web content is duplicated.
Google On Percentage That Represents Duplicate Content
Google has not revealed what percentage of web content is duplicated. They only said that it’s in the high 90 percents and that they use a machine learning system to find and flag duplicate content, then manually remove listings when they are caught. Google On Percentage That Represents Duplicate Content