Semalt Expert Explains What Content Duplication Is And How It Affects SEO
Inconspicuous errors can often harm a website's position in search engines. One of them is content duplication. From this article, you will learn how to check if someone is copying your articles, as well as the consequences of duplicate content and tips on how to avoid this phenomenon yourself.
What is content duplication?
Duplicate content is duplication of content on the Internet. This is a situation where identical or very similar text is found on more than one URL. Most often, such duplication of content is not intentional, but it causes confusion among search engine bots and it is better to avoid it for the sake of website positioning. A distinction is made between external and internal content duplication. External duplication of content is the copying of texts from other websites.
We can cause this ourselves, for example by sharing the same information on our social media channels as on our blog. Unfortunately, we are often unaware when someone (e.g. competitors) has copied our texts to their media. It also happens that duplication of content occurs automatically- created by programs that download texts from a given website, as well as product descriptions, photos or information about a specific person or company.
Unconsciously, you can even do it yourself - by posting the same article on your blog and on a partner's website or by copying descriptions from a manufacturer's website. Internal duplication of content is less harmful to positioning and consists in duplicating the same content within your own website. In both cases, duplicate content is worth eliminating. Hence, it is important to be aware that content is duplicated, whether by ourselves (for example, by publishing the same description on several subpages), automatically using various tools, or knowingly by third parties. Most often we deal with it for repetitive reasons on websites.
Duplicate content - reasons
The problem of duplicate content is quite common. It is caused not only by human errors or non-respect of copyright but also by technical errors. To eliminate duplicate content, you need to know its causes. You will find some of them below.
Separate versions of the same pages
If the page exists at HTTP and HTTPS, with and without www, and the content is the same in both versions, it is a duplication of content. For Google, there are physically two separate pages, but with the same content, which compete with each other for a better position. It is worth thinking about implementing a global redirect from HTTP to HTTPS.
Identical language versions
Unfortunately, few people realize that creating different language versions of a website can cause duplication. That is why it is so important to correctly implement the "href lang" attribute. The use of parameters in the script for this purpose causes the English version and, for example, German versions to appear at the same address. In addition, it happens that not all subpages are translated on the website and then the same content appears in two places.
Copying product descriptions
A frequent reason for duplicate content is product descriptions in online stores that come from the manufacturer's website. Copying texts and pasting them together on the page is not a good solution. A separate description should be prepared for each sub-page of the product. It should also not be placed on social media or other product subpages, as Google considers these procedures as duplication.
Duplicate title and meta tags
In a situation where separate subpages do not receive separate title tags and various meta-description tags, tag duplication occurs, which negatively affects website positioning.
Posting the same content
Duplication of content that negatively affects the condition of a site is often caused by our ignorance. You cannot publish the same texts on a blog, in a store, on Facebook and Instagram, even in fragments. They would be perceived as duplicate content.
Copying content from other sites
It is worth remembering that for Google, the content on Wikipedia will be more attractive than the same content on our website. Content should be relevant, and valuable, but also unique.
Duplication of your content by other sites
Sometimes articles on specialized topics or research results can be copied, but also if we sell products, e.g. in the b2b industry, store owners can copy product descriptions from us. This is a common situation. Then it is worth reserving the texts and, as a producer or distributor, sensitising your clients to the phenomenon of duplicate content.
Pagination of comments
Even with a tool as popular as WordPress, there is a problem with comment pagination. Pagination of comments leads to duplicate content in the article URL, which happens with the numbering of the comment pages.
URL tracking, i.e. creating separate URLs to monitor marketing activities, can contribute to duplicate content. The situation applies not only to UTM addresses, but to all other designations of the URL that serve to differentiate it, but do not affect its essential part. For example, we are talking about sorting, which causes the same content to appear under several very similar, but ultimately different URLs. The solution in this case is to canonicalize the URLs.
Indexed developer site
If Google has indexed the "draft" version of a page, you should perform a 301 redirect from the development version to the final version of the website. If the development version is still in use, you can block it in robots.txt. It is worth analyzing the above reasons and checking whether they are present on your website. This will allow you to eliminate practices that cause duplication of content and negatively affect the condition of the site.
Duplicate content on category pages
Another content duplication may occur when the description of the category in the online store appears many times in the case of filtering, sorting or pagination. To avoid this error, simply use the "no index" tag or the "rel=canonical" attribute. The website will also be damaged by identical descriptions of product categories, for example when a given product is assigned to several categories, and the incorrect structure of the page causes the same content to appear on several URLs.
Content duplication and SEO
Positioning the page when there is duplication of content within the site is difficult. Google appreciates unique, qualitative and valuable texts. Sites with duplicate content have less organic traffic because they are negatively assessed by Google bots. In fact, crawlers do not know which page is the original, so they will most often consider one of them as the original, and the others are ignored. There are times when sites with too much duplicate content are penalized or removed from the index altogether. This only happens if the copying is clearly intentional.
Google often refuses to index a given subpage, i.e. when having an extensive website with non-indexed subpages, we are not able to maintain its good condition. Penalties from Google and complete removal from the index are not common practices of Google. But the very fact that you have a page that is not perceived as attractive by the bots because of the duplication of content means that you lose the chance for higher positions.
How to deal with duplicate content?
There are three ways to deal with duplicate content. It's important to specify which URL to include in indexing:
- 301 redirect - configuring a 301 redirect from the duplicate page to the original one most often solves the problem. Such websites cease to compete with each other in the ranking, on the contrary, by joining forces, they lead to greater popularity of the website indicated by us.
- rel="canonical" attribute - it informs the search engine that the page should be treated as a copy of the original page. So, the links and ranking power should be attributed to the original page whose URL we provide.
- meta robots "noindex" - this is a good solution, especially in the case of problems with duplicate content resulting from pagination. The meta tag allows links on the page to be indexed but prevents those links from being included in the indexes. It can be added to the HTML header of any page that needs to be excluded from indexing.
How to detect duplicate content?
The most important thing is to be aware of the duplicated content and locate it on the website. Several tools make this possible:
1. The Dedicated SEO Dashboard - is a very powerful tool to perform a complete analysis of the website. The tool informs about the percentage of duplicated content on the website, so it allows you to remove the URLs with which your website contains duplicate texts. With this tool, we can track the illegal practices of someone from outside.
In addition, this tool analyzes duplicate external content, highlights copied text and gives URLs to pages with similar content.
Moreover, the tool has other very interesting features that you can use to successfully perform the majority of your SEO tasks.
Among these functions, you can find:
- Competitor analysis
- Keyword research
- Technical SEO audit
- SEO reports
Feel free to dive into its remaining data-driven features at demo.semalt.com
2. Google Search Console - helps in website optimization. Just go to the Status tab and check for possible technical errors, for example, when the same content appears at different addresses. It is worth noting that Google Search Console is a free tool.
3. Google search - it's a tedious but possible job, especially when we have to analyze a short text, not an entire page. It is worth pasting its fragment into Google and seeing what will be displayed.
Duplicate content - summary
Monitoring a website for duplicate content and eliminating practices that lead to it are very important for website SEO because it translates into financial profits for the company. Displaying the page in organic search results brings long-term effects, i.e. low cost of acquiring a lead, greater brand recognition, better visibility of the page in the search engine and higher sales.
Duplicate content is one of the factors that negatively affect the condition of the entire website. The very awareness of its causes allows you to at least partially eliminate the problem of duplication of content. Controlling the website in this respect also allows you to monitor the activities of competitors, which is always advisable and supports business development.