Duplicate Content Catch-22
Bloggers and ecommerce sites alike are working hard to deal with the unintended consequences of their content management systems. For all their wondrous flexibility, blogging software can create duplicate content at just about every turn — archives, categories, tags, stubs — each can potentially create, as Google puts it, “substantive blocks of content within or across domains that either completely match other content or are appreciably similar”. Google’s preferred method of dealing with duplicate content is to filter out those pages it considers duplicate from their search results.
Dupe content happens. One way is as a side effect of creating multiple paths to content — by allowing users to access the content they desire in the way they prefer [browsing, tagging, search, etc], CMS systems can create pages with identical content where the only difference is in the navigation. A flexible information architecture would allow a visitor to find a pair of, say, UGG Women’s Classic Short Boots via multiple paths: Women’s/Footwear/Boot/Sheepskin and Collections/UggAustralia/Boots/ are just two examples of valid navigational paths that could lead to the same content. The ability to organize products in multiple ways is one of the techniques that can help each visitor interact with complex site data in a way that’s most meaningful to them.
Assuming your site is indexed, once Google has identified duplicate pages, it’ll make its own decision on which page will display in a SERP; the rest will be hidden behind the “In order to show you the most relevant results, we have omitted some entries very similar to the ones already displayed” disclaimer. And while Google is correct that filtering out these additional results does improve the search experience, your site’s standing — your reputation as an authority for that item — can be diluted, since some of your inbound links will not be indexed.
What to do? While in the past the standard approach would be to have a popular item available in multiple locations, the need to unduplicate this content will mean that we either [a] remove all but the most popular page from the indexable files, or [b] redirect users to the most popular page, thereby focusing our traffic and links into one concentrated page.
Pick your poison. In either case, we have a conundrum where attempts to improve the search experience results in compromised marketing position. In the first case, using your robot.txt file to exclude the secondary page[s] from being indexed, has the potential to reduce the number of indexed inbound links. On theotherhand, the redirect strategy means a visitor who has come to a less popular page version will find themselves having unwittingly touched a portkey, instantly damaging any faith in your site’s navigational reliability.
If your site has products offered in different categories, you may be at risk of having Google filter out your content. If your site navigation is dependent on AJAX, the risk is even greater, since the dynamically-generated “unique” content of the breadcrumb won’t be recognized during the crawl. Given the options, I’d rather take a slightly lower page rank on the serp than risk alienating my customers while they’re in the process of shopping my site. The search leads to the site, but it’s your site’s content, branding and user experience that will convert the visit into a sale.
Posted: May 10th, 2007 under marketing, search.
Comments: none

Write a comment