What Is Duplicate Content On a Website? + Tools To Detect And Fix It

Last Updated
By CKMAdmin

What Is Duplicate Content On a Website? + Tools To Detect And Fix It

What Is Duplicate Content On a Website? Duplicate content, without a doubt, can weigh down the SEO positioning of our website or eCommerce. And despite this, this is one of the least optimised errors taken into account by most of us.

This problem is enough reason for many of the pages on your site, perhaps those already moderately positioned, to disappear from the top positions in searches over time.

Why? Simple, because if we consider that Google constantly debugs the SERP to show results of greater quality and relevance for its users, we should not be surprised that duplicate content is frowned upon by it.

In one of your searches, you would like to find yourself with the exact text and information in almost every one of the results on the first pages of Google.

I guess not”. You wouldn’t like that situation, nor do I think anyone wants it. But luckily, whether it was voluntary or involuntary, these mistakes have a solution.

In this article, we will learn how to detect duplicate content (thanks to some of the SEO tools I use) and how to give you the most effective solutions for each case. But before, as I usually do in all my guides and tutorials, I would like to first put you in a situation and give you a definition of the topic that concerns us today.

What Is Duplicate Content On a Website?

The duplicate content SEO occurs when the text is replicated partially or entirely on different URLs, whether on pages within the same domain (internal) or on other pages of different websites (external). Furthermore, this problem can also happen because 2 or more URLs lead to the same page within your web domain.

External mirrors often occur due to third-party copying or plagiarism. On the contrary, internal duplication usually occurs due to errors in the web structure of our site, and they cause multiple URLs to lead to the same page or because we have used much text in descriptions of 2 or more pages.

The content is estimated to be considered a duplicate when more than 30% of it is already published in other URLs. Conversely, it could be regarded as original when at least 70% of the text on a page does not have a structure identical to that of others.

Can I Be Penalised For Duplicate Content?

In my experience, the duplication of content and the cannibalisation of keywords reduce the quality of our Google pages instead of bringing us a pure and straightforward penalty. Therefore, they have a significant loss of positions in the SERP.

Of course, if you have a website that continually abuses these practices, you will most likely not be penalised by Panda (Google’s algorithm for controlling these issues).

Anyway, penalise or not, you should be clear that search engines do not welcome these things. In addition, the tremendous progress in their algorithms allows them to more easily detect these copies of texts (especially within the same website).

Why Is Duplicate Content So Negative For My Website?

Taking into account that you are already aware of what we are talking about and what I mean with this common SEO problem, you should know that some of the consequences that it may cause you are the following:

Have Low-Quality Content

This problem can lower the quality of your pages for users and Google. This means that the one that Google selects may not be what you want, and, as a consequence, a “copy” with lower quality can be shown to users, and it would rank worse.

Decreased Organic Visibility

In short, if you lose quality, you also lose positions with your pages. And this drop in the results of the SERP leads to a decrease in your online visibility and traffic from search engines.

Decrease in Conversions

If you have different pages with similar text, the search engine should select the most optimal page for that search intention. Being able to be the chosen one that is not the most convenient for your business strategy.

Wrong Authorship Attribution

When it detects two similar URLs in different domains, the search engine chooses the original version based on the index date and/or the site’s popularity.

In other words, Google could wrongly decide which is the original and punish the wrong website, especially if you have little authority on the Internet. If you always act professionally and generate your own texts, I understand this will produce outrage.

That is why it is essential to crawl the Internet with a particular frequency to detect copies of your original content since the person harmed may unfairly become you.

Loss Of Authority

As with cannibalisation, duplicate pages can make your website less powerful. But in addition, the links you receive can point to different URLs for the same topic, and instead of joining forces to enhance their positioning, the links you receive are being divided.

Problems in Indexing

The indexing of the pages can be affected because the search engine crawls all of them for a certain amount of time. The loss of time in this crawling due to an excess of low-quality or duplicate pages will make the search engine leave part of your site without visiting.

10 Tools To Know If I Have Duplicate Content On My Website

To analyse duplications, the most sensible thing is to start with titles, headings, descriptions and similar sections. The most effective method to identify it is through the use of tools.

And when I say tools, I’m not only talking about different types of platforms or software created for it, but I’m also talking about search methods, such as the “site:”, which I’ll talk about later:

Google Search Console

It is one of the best starting points. To analyse this and other questions related to your domain on the Internet, sign up for Google webmaster tools and go to “Search aspects” and “HTML improvements”.

Next, look at the duplicate title tags and meta descriptions. You will find the existing replicas and the pages so that you can correct them.

Going to Google Search Console is an excellent option to detect it within your website.

SEMrush

As you know, SEMrush, in addition to being one of my favourite tools, is also one of the most complete and, as such, it includes a way to detect if you have any problem of this type.

It has a complete “SEO Audit” tool for a website, where duplicate content can be easily identified.

Screaming Frog

Thanks to Screaming Frog, you can crawl a site in search of duplications, among other functionalities that this powerful SEO tool allows. To do this, you must use the “duplicate” filter on the Page, URL, H1 and Meta Description tabs.

It is not a free tool; however, its outstanding features can make you consider hiring it. 

Google Analytics

If you access the report « Behavior «, « Site Content», and « Landing pages «, you can also find duplications. Here, Google Analytics searches for pages and URLs that receive less organic traffic than they should.

Plagiarism

Thanks to the Online Plagiarism tool, you can identify if a text is original or coincides with one already published on the network by including it in the space provided.

In addition, you can easily upload your PDF file from Google Drive if you save your posts in the cloud before publishing them on your blog.

Given its speed and simplicity in informing you if the chosen content is copied from another that already exists on the Internet, I have a predilection for it. In fact, it is one of the verification tools I use with my team in JF-Digital. You can also download and install it on your hard drive.

Quetext

It is a simple and intuitive Online platform that, once you paste the text in question into the space provided, gives you all the necessary information to know if it is copied or original.

With Quetext, you can know exactly what other websites are that have already published an identical text to the one you have indicated to the tool, thus marking the same fragments that, therefore, you should not post on your page if you do not want to be penalised.

There are a wealth of web analytics tools to identify broken links, unindexed pages, duplications, and other issues that are more difficult to detect.

We will see these tools later: Siteliner or SEMrush, among others.

CopyScape

In addition to being one of my favourites, it is widely used by many professionals in Digital Marketing and the Internet.

With Copyscape, you can enter the URL of your site and check if there is any other text on the network identical to yours. This way, you could contact the person in charge and ask for explanations.

Command «site:» + «Keyword» in Google

This command searches Google for indexed pages of your website with a particular phrase or specific keyword (or products if we are talking about an online store IN PrestaShop or similar ).

For example, among the results, you can check if pages are indexed in Google with duplicate titles or descriptions and if some have been moved to the secondary index. At the same time, this is also an excellent method for finding SEO cannibalisations.

Virante Tools

It effectively detects fundamental aspects that a blog must meet to avoid duplication.

The ideal is to have all the “checks” in green and, if you find any in red, this is where we should work to be able to correct the error.

If in Virante Tools, the first check is red, it means that the URL is not canonical and the URL format is not correctly selected. This is the biggest mistake with which to generate this problem that concerns us today.

SiteLiner

With this Online tool, you will be able to detect the duplicate content of a small website. With its free version, a maximum of 250 pages can be analysed.

Therefore, even if it is to start with SiteLiner, you have enough.

How Can I Remove Duplicate Content On My Blog or Website?

It has already become clear that search engines do not like duplication because it impoverishes the user experience. Therefore, if you detect it, you must do “the impossible” to eliminate it.

If you have duplicated it on your site, there are several ways to fix it or ensure search engines know which one you want to take as “primary”.

The problem is that you have to know some programming, and only some people are in a position to write code in the right places on the Web.

If you do not master the HTML language, I advise you to seek a specialist’s help or hire their services.

At this point in the article, you should know the importance of not having duplicate content if you want your website not to be delegated to the most delayed positions (or pages) of the Google SERP.

The most common ways of dealing with duplication on our website are:

Change The Text

This is one of the simplest ways, but at the same time less used. Therefore, if you have two very similar pages and want to position both in the search engines, choose to rewrite the content of one of those URLs to make it as original as possible.

Make a “Canonical”

The tag “rel = canonical” was created to deal precisely with this situation. For example, it is widely used in eCommerce when we have products with very similar descriptions.

The “rel = canonical” is a line of code inserted into the <head> of your page code and tells search engines which is the original version of it. And therefore, it prevents them from cataloguing these contents as duplicates.

Here, we must bear in mind that, with this attribute, the search engines decide what to do with those pages; that is, they resolve whether they index them all or only the main (or canonical).

This is a solution that anyone can implement. But it is also true that you need some knowledge of HTML to put the tag in the right place or to have a plugin/module to help you with that job.

301 Redirect

It is the best option when using the previous tag or when two indexed URLs lead to the same place is not feasible.

With 301 Redirection, visitors are automatically sent from one page to another page that interests us automatically.

That is, you can use it mainly in two situations:

1º If you have two identical or highly similar pages, and for whatever reason, you cannot use a canonical, then you should redirect one to the other (considering their relevance or importance, since one would disappear in the face of your visitors).

2nd, Visitors to your website can reach the same landing page from different URLs. Making a 301 redirect of all those URLs to the primary or correct one, wherever they come from, you direct your visitors and search engines to a single URL.

And, incidentally, we inform the search engines what the “correct” URL is and what they should index.

Through URL Parameters

If duplicate content is produced by specific parameters, from ‘crawling’ and ‘URL parameters’, you can tell Google what to ignore using Search Console (Webmaster Tools).

The procedure is almost the same as with Robots.txt: search engines are told which URL’s to index and which ones to ignore.

This method is handy, especially for eCommerce with different sizes and colours of the same product. The URL will be the same for all size and colour variables, but the Webmaster will only be interested in highlighting one of them with the general description of the product.

Robots.txt

It is another of the actions to avoid duplication on the pages.

If for any reason, you cannot redirect or delete the page with duplicate content, this is the best option to avoid the dreaded sanctions.

With Robots.txt files, we tell search engines which pages or files to ignore or block; therefore, they should not invest a single millisecond.

Sole Editor

If you edit your blog yourself, you need to know it through the author pages to generate duplicate information. In WordPress, they are usually of the type:

https://domain.com/author/your-name

The solution is straightforward, you must mark the author pages as” noindex – follow“, so you tell the search engine not to index those URLs.

This only needs to be done when there is a single author. Still, in blogs or digital magazines with various authors, it would be okay if you did either.

Do not Abuse The Labels or The Categories in The Blogs.

The use of categories and tags can be hazardous to your SEO positioning.

For example, in a typical blog, contrary to what you may be thinking, indexing this type of thing only generates duplicate content or cannibalisation.

Still, if you want to index your blog’s categories and/or tags, do it strategically, with great care, and only generate industrial amounts of them with coherence and meaning.

If you are unclear, you can add” meta-tags: no index, follow “between the options of your SEO plugin (if you usually use WordPress); thus, you will not create duplicate content with them.

How To Fix Duplicate Content When It’s Off Your Site?

After reviewing all the ways to detect it on our website and the different ways to solve that problem, let’s see now what we can do when it is in a foreign domain.

Option 1: request to be removed

In this case, you can “kindly” request that they be removed via email, social media, or a contact form.

Perhaps the person who plagiarised you does not know how bad that can be for both of you.

If this first contact does not bear fruit, which unfortunately happens with some frequency, we have to take a second step.

Option 2: a canonical link

Although I find it difficult to achieve, you can ask to link your text with a « canonical «if you do not want to delete it.

And in this way, the search engine will find the original content, and neither will run the risk of being penalised.

Option 3: formal request to Google

But this 2nd option may not work either, which also happens frequently.

So, we will move on to “bigger words”: request Google to de-index your URL.

To do this, you must file an application under US copyright law.

You can also submit a spam report to Google through THIS LINK, or if you do not have a Webmaster account, you could report the blog from the “Spam Report” section ( ACCESS FROM HERE ).

In all these cases, saving all the effort previously made to solve the problem on your own is convenient.

Keep a copy of the emails or messages you have sent/received with the site’s Webmaster with your duplicate content.

Option 4: Local Law Enforcement

The last option you have, if everything else does not work, is to resort to the Local Law Enforcement of your country so that they apply the current legislation.

You must file a complaint for the crime of plagiarism. Since publishing texts online is similar in any way if you post texts on paper.

Everything you do digitally is automatically copyrighted; therefore, you can go to court to have a plaintiff ordered to erase plagiarised content.

And, by the way, to compensate you financially for the damages caused, if applicable.

This last method may seem too extreme, but you must remember that many people make a living from their website or blog (as in my case).

Visitors who come to you hire their services, buy their products or simply click on the ads.

Duplicate content on these types of sites can lower your income considerably.