Wednesday, February 15, 2012

The Real Skinny On Duplicate Content

The issue of duplicate content is a rather contentious one in the world of Search-Engine Optimization. There were for many years – and perhaps still to this day – myths floating round about duplicate content penalties.
Essentially, folks were told that if they produced duplicate content then they would be punished, and many thought that this meant you could not syndicate content, or post the same article on your site and another. Personally I think this was a ploy by Google to deter potential spammers.

You see, back when search engines first started to become popular, it was easy to rank in the top spots for your keywords by stuffing your website full of keywords. The black hat community used the same article on multiple pages and the search engines gave them good rankings. Thankfully, the algorithms are much more sophisticated these days and so are the users.
In all reality, what Google was trying to say is that you could not post the same article over and over again on your own site in order to attempt to manipulate your search-engine rankings. Duplicate content is specific to one site.
The fact of the matter is that plagiarism exists. It would be nearly impossible to avoid having content ‘duplicated’ from one site to another. While there are ways of reporting people who steal your content, more often than not it is something that goes unpunished.
Life is made harder still by the fact that search engine algorithms do not have a good basis for determining who stole whose content. In addition, and in relation to the internet, content ‘scrapers’, article spinners and an abundance of black hat internet marketers looking to make a quick buck make it harder still to police content.
This is a problem, although there are various web sites – such as Copyscape – that can be used to determine if someone is stealing your content, and action can be taken.
Unique content plays a big part in the way in which the search engines rank your content, and is therefore a big plus to your SEO efforts. Fresh – that is to say updated or relevant – content is favored massively over content that has been sitting dormant for years. Cue note to self to update some older blog posts with a little more fresh content.
Duplicate content on the other hand, is content that appears in more than one place. If you are unaware as to how your content management system works, you could unwittingly be creating duplicate content. It can appear, to the search engines, in various places on your own web site.
In terms of your search engine rankings, it is crucial to solve any on-site issues. Failure to do so can result in the search engines not knowing which page to rank for your keywords or, worse still, not ranking your web pages at all.
Assuming that you have minimized the use of similar content within your own site, then how can the search engines still find duplicate content?
The main problem is the way that the search engines read your Uniform Resource Locator, or “URL”. For example “http://google.com”, “www.google.com” and “google.com” all look the same, but to the search engines they are indeed three separate pages. This problem magnifies when you look further at the architecture of your site; dynamic pages, categories, print friendly pages, session ids and even capitalization of letters can have an impact.
According to Google, examples of non-malicious duplicate content could include:
Discussion forums that can generate both regular and stripped-down pages targeted at mobile devicesStore items shown or linked via multiple distinct URLsPrinter-only versions of web pages
The solution to this issue is best explained with a working example, so consider this:
A web site about bouncing balls has a page about green bouncing balls, located at –
http://bouncingballs.com/greenbouncingballs
The content of which can also be found by the search engines via the products page –
http://bouncingballs.com/products/greenbouncingballs
It is the same page, although because of the way the URL is generated, to the Search Engines it appears to be two pages, within the same site, that are hosting the same content; in other words Duplicate Content!
To confuse matters further still, the search engines would find further content issues when crawling “www.bouncingballs.com/greenbouncingballs”. If only there was some way to inform the search engines that this is an issue with the URL as opposed to a malicious duplicate content issue. But wait… there is.
There are a number of ‘work-around’ ways to solve the problem of duplicate content on-site, however the best way is to make things right permanently. In the case of the working example this is a two-step procedure.
First of all, you need to solve the top level domain issue and avoid confusion as to whether or not your site is located at “http://bouncingballs.com” or “www.bouncingballs.com?.
A “301 redirect” in your htaccess file is the answer (example below). This is a permanent instruction to your web server that it should always re-direct “http://bouncingballs.com” to “http://www.bouncingballs.com”. Simply choose which version you would like to use, and then STICK WITH IT. It is important that you are consistent with your internal linking, so ensure that all your links go to that version.
The following code can be used to 301 redirect your site, just replace “yoursite” with your own domain, and then paste it into your htaccess file.
RewriteEngine on RewriteCond %{HTTP_HOST} ^yoursite.com$ [NC] RewriteRule ^(.*)$ http://www.yoursite.com/$1 [R=301,L]
The second step to solve the working example would be to inform the Search Engines that you are aware of the URL issues within your site. Otherwise known as “Canonicalization”, this is an SEO best practice for identifying your preferred URL to the Search Engines and users alike.
A simple HTML meta tag can be used within your pages to let the search engines know “http://bouncingballs.com/products/greenbouncingballs” is indeed the same as “http://bouncingballs.com/greenbouncingballs” and that all credit should go to your preferred version.
By placing the following tag code – rel=”canonical” – in the header of your duplicate page, the search engines will not penalize you. For example, the code would be placed in the header section of the URL “http://bouncingballs.com/products/greenbouncingballs’, meaning that the search engines would credit your preferred URL with the content.
If the rel=”canonical is applied to all issues of on-site content duplication then you can expect to see great changes in your Search Engine Rankings.
More information can be found on the Google Webmaster pages. Watch the video – it will really help you understand the basics, and the reasons for doing this. Also, if you are using WordPress, there are several plugins that will assist you in your quest to cut down your on site duplicate content issues.

No comments:

Post a Comment