Duplicate Content & How to Handle It
It’s not uncommon to give a blog article multiple categories on your site. But maybe you vaguely recall something about duplicate content being bad for Google SEO. If the same article has many URLs – one for each category you assign it to – you have just duplicated it and, at best, are competing with yourself to get it ranked in search results. Worse still, you might even get penalised by Google for the duplication.
You know there’s a solution, but what was it again? Read on for how you can use this duplication trick and still keep the Google spiders happy.
Duplicate Content – When One plus One doesn’t Equal Two!
To be clear, when we refer to duplicate content here we are referring to using one of your own compositions more than once on your own site. An entirely different problem type exists where you copy somebody else’s content and put it on your own site. In these cases, assuming the content itself is not spammy, it might be SEO neutral. In other words it certainly won’t gain you any SEO points, as the original piece has been indexed first and will rightly get the SEO benefit. Google took care of that issue under it’s ‘Farmer Update’ to Panda in February 2011.
To return to the scenario outlined at the beginning, you wish to duplicate your own content on your own site. What’s wrong with that? It fills some extra space and the content might logically belong in two distinct categories. In this case it makes sense to duplicate content on your site and it’s hardly helpful to be penalised for it, right?
We must stress, you are not being directly penalised for duplicate content of this sort by Google. Unfortunately, the penalty is less obvious and can bring some SEO pain regardless. This is largely because of a number of factors that can align together to give you a headache. Let’s look at what we consider to be the ‘big three’ of these problems separately:
- The duplicate content increases your crawl budget
- The duplicate content is ignored
- The duplicate content might also split your backlink vote
Essentially, what is happening in the first point above is that Google has to spend additional time crawling your site to capture the duplicate content. As the spiders crawl your home page, they discover other links that lead deeper into your blog. But since Google can’t follow all those links to infinity, you are allocated a crawl budget – the amount of time Google will spend on each iterative crawl of your site. Your duplicated content has just taken a little of that budget away!
The second point, where the duplicated content is ignored, occurs because Google will want to return diverse results in its’ search engine results pages (SERPS), so will omit the duplicate from the result set. This isn’t really a penalty, but can cause some head-scratching when the analytics come in and your duplicated page shows little or no traffic.
The third point can be more problematic. If other websites link back to your content, but do so variably – sometimes linking to version A, other times linking to version B – your Page Rank will get split. All links to your site are considered votes. If half of all websites link to one page on your site and the other half link to the duplicate, you’ve just split your vote and lose authority. Double the content, when dealing with duplicates, does not equate to double the Page Rank. It could, under the right circumstances (or wrong circumstances, as the case may be) lead to half the Page Rank!
Fortunately, there is a solution. It has been there for quite some time. The answer is relationships! Now, before you think we’ve gone all ‘dating site’ on you, we specifically mean the ‘rel’ attribute for link tags.
So what is the ‘Rel’ attribute and how do you use it? Firstly it is an abbreviation for ‘relationship’. It can be assigned a variety of values, but in the context of duplicate content, we are only interested in one value, namely canonical.
Let’s say that URL ‘A’ points to one copy of the content and URL ‘B’ points to the duplicate content. You need to tell Google which version is authoritative, or to use the parlance, canonical. This is something Google will respect, they even encourage it:
Now you are telling the crawlers to ignore the duplicate content by defining which version is canonical. You have saved crawl budget, won’t get a split vote for your backlinks and won’t have to worry about omissions from SERPs, since you will have already decided which URL to rank in the result set. So, once you have decided which URL points to your canonical (remember, that just means authoritative for our purposes) content, you add the appropriate link tag to the header of the non-authorative version, as follows:
- assume the authoritative content is at URL http://mysite/mypage-one
- assume the non-authoritative content is at URL http://mysite/mypage-two
Then add the following link tag, with the rel=canonical attribute, to the header of the non-authoritative page
link rel=”canonical” href=”http://mysite/mypage-one”
So now, by using the rel=canonical tag you make this preference known to Google, so that they may only index and serve the canonical version of the page. Problem solved!
Having this trick in your locker brings an important refinement to your on-page SEO and adds organisation to the link architecture of your site. If you are working with an old-fashioned static site (yes, I literally audited one of them last week for a client – they still exist!), this requires a little more manual effort on your part and could get troublesome (for example, if you ultimately decide to remove the authoritative URL and forget to remove the canonical link from the non-authoritative page).
Many modern CMS, such as WordPress, Joomla and Drupal have plugins available for managing canonical links and duplicate content for you, which takes some of the labour out of it and reduces the likelihood of unhelpful human error! Has your SEO professional been managing this for you? Why not ask them and surprise them with your new learned SEO knowledge. And if they hesitate in answering, ask them again. Sometimes there’s no harm in repeating yourself!
Leave a Comment