Thursday, January 14, 2010

How to reduce duplicate content issues through URL canonicalization?

The web is full of duplicate content. Search engines try to index and display the original or “canonical” version. Searchers only want to see one version in results. And site owners worry that if search engines find multiple versions of a page, their link credit will be diluted and they’ll lose ranking.

For example, most people would consider these the same URLs:

  • www.example.com
  • example.com/
  • www.example.com/index.html
  • example.com/home.html

But technically all of these URLs are different. A web server could return completely different content for all the URLs above.

Q: So how do I make sure that Search Engines pick the URL that I want?

A: URL canonicalization

Canonicalization is the process of picking the best URL when there are several choices, and it usually refers to home pages.

The goal of the canonicalization process is to transform a URL into a normalized or canonical URL so it is possible to determine if two syntactically different URLs are equivalent. Search engines employ URL canonicalization in order to assign importance to web pages and to reduce indexing of duplicate pages. Web crawlers perform URL canonicalization in order to avoid crawling the same resource more than once. Web browsers may perform canonicalization to determine if a link has been visited or to determine if a page has been cached.

Q: How to fix this duplicate content issue?

A: Make all the non-canonical URLs do a permanent (301) HTTP redirect to the canonical/preferred URL .Suppose you want your default url to be http://www.example.com/ . You can make your web server so that if someone requests http://example.com/, it does a 301 (permanent) redirect to http://www.example.com/ . That helps Google know which url you prefer to be canonical. Adding a 301 redirect can be an especially good idea if your site changes often (e.g. dynamic content, a blog, etc.).

Q: Is there anything other option to fix this duplicate content issue?

A: Yes, Using the new canonical tag

Sometimes can't generate permanent/301 redirects, Can't help how people link to you, Uppercase/lowercase paths, Session IDs, Tracking codes, analytics, and landing pages

Than we can specify the canonical version using a tag in the head section of the page.

For example:

http://www.example.com/product.php?item=swedish-fish&trackingid=1234567&sort=alpha&sessionid=5678asfasdfasfd
http://www.example.com/product.php?item=swedish-fish&trackingid=1234567&sort=price&sessionid=5678asfasdfasfd
 If Search Engine knows that these pages have the same content, we may index only one version for our search results.Now we can specify a canonical page to search engines by adding a  element with the attribute rel="canonical" to the  section of the non-canonical version of the page. 
 To specify a canonical link to the page http://www.example.com/product.php?item=swedish-fish, create a  element as follows:
< rel="canonical" href="http://www.example.com/product.php?item=swedish-fish"> 
Copy this link into the head section of all non-canonical versions of the page.
 
  
  • You can only use the tag on pages within a single site (subdomains and subfolders are fine).
  • You can use relative or absolute links, but the search engines recommend absolute links.

This tag will operate in a similar way to a 301 redirect for all URLs that display the page with this tag.

  • Links to all URLs will be consolidated to the one specified as canonical.
  • Search engines will consider this URL a “strong hint” as to the one to crawl and index.

1 comments: