There Can Only Be One: Duplicate Content & Website Impacts


What is Duplicate Content and Why is it an Issue?

Duplicate content can be defined as information that is posted online in multiple places. When there are multiple pieces of the same content strewn throughout the Internet, search engines have a difficult time deciding which version of the same information is more relevant to show in a search query. If you are promoting your online store and have duplicate content on your website, you may find that your marketing efforts will yield lower results.

In order to provide users with the best search experience, search engines will rarely show duplicate versions of the same article or webpage and are forced to decide between which version is most likely to be the original – or sometimes the best.

Simply put, if you have duplicate content on your website, chances are that it is not showing up on the search engines. This not only impacts your lead generation, but also decreases the ‘power’ or domain authority that your website has with other search terms that you are attempting to rank for.

Frequent Causes for Duplicate Content:

Duplicate content may be caused by any number of reasons:

1.) Dynamically Generated Search Parameters

This error is usually caused by the stacking of modifiers in your site’s URL architecture. For example, the following URLs can all be indexed by a crawler:

• baking.com/cakes/vanilla-cake/

• baking.com/cakes/vanilla-cake%7in

• baking.com/cakes/vanilla-cake&7in=frost

Although this is a simplified example, you should check to make sure that your Content Management System (CMS) is not creating multiple parameters to your URLs and having Google crawl them. String parameters are great for website analytics, but make sure to use the canonical tag (see below) to ensure the search engines know which is the correct URL to ‘count’.

On the other hand, Google also has the ability to crawl through such website navigation on its own, creating endless strings of unnecessarily long and complex URLs that can be potentially indexed.

Preventive Measures: Using a wildcard (*) URL can block certain crawlers via your robots.txt from indexing your URLs after a certain subdirectory

Example: www.baking.com/cakes/vanilla-cake/*

Easy Fix: Adding a canonical tag (rel=”canonical”) to your preferred URL via Google Search Console’s parameter controls can alleviate the problem.

2.) Syndicated Content

Syndication, while a terrific way to get your information in front of new and fresh users, is another way to land yourself in a costly duplicate content issue. When beginning to syndicate content, it’s important that you have set guidelines for those publishing your work:

Canonical Tag

In an ideal world, you would first request that the publisher use the rel=”canonical” tag that we discussed earlier. When put on the article page of the publisher’s site this tag tells search engines that this particular website is the original source of the content.

No Index

The publisher could also “noindex” the syndicated content. This html code lies in the meta tag of a given webpage and would act to correct potential problems with duplicated content in search results.

At the very least, request that publishers provide links back to your website and the original article for attribution and let Google (or other search engines) know where the content came from. Here is an example of proper attribution:

Source: Leadershipiq.com

3.) URL Structures

Be sure to check whether your website has one or two live versions being indexed by a search engine.

WWW. & Non-www. URLs

Depending on the setup of your website, you will either have a ‘www.example.com’ or an ‘example.com’ domain address. These are different URLs in Google’s eyes – Google will only show one of them. It doesn’t necessarily matter witch URL you choose, you just have to make sure your messaging is consistant.

These URLs can be seen in your browser and should reflect your company’s branding strategy:

WWW. URL (national example): http://www.proflowers.com/

The non-www version of Proflowers.com 301 redirects to the www. URL

HTTP and HTTPS Protocols

Just like www and non-www URLs, HTTP and HTTPS protocols are treated as separate websites.

Before Google incentivized web developers to make their sites adhere to the HTTPS protocol, many developers chose to only add HTTPS to the web pages that needed added security features.

Easy Fix: If you find that your website seems to have duplicate versions you can use Google Search Console to implement 301 redirects (see below) and specify your preferred domain.

4.) Printer-Friendly Pages

Sometimes a page can create a duplicate, printer-friendly version of a web page. For example:

• homepage.com/page-1

• homepage.com/printer/page-1

Easy Fix: Self-referencing canonical tag to the non-printer version.

5.) Scrape and Steal Websites

A scrape and steal website, also called a ‘scraper site’ aims to copy other websites content, generally for malicious or commercial intent. The purpose of such scrapers is primarily to trick the search engines into thinking that they wrote the content and earn revenue from the traffic.

Preventative Measures: Use absolute URLs instead of relative URLs when creating new webpages to ensure that scraper sites cannot take your content and use it as their own.

Absolute URL: https://www.example.com/awesome-content

Relative URL: /awesome-content

Easy Fix: Finding your content on another website already? Use a self-referencing canonical tag (see below) to let Google know which website produced the content first.

Directions and Resources for Fixing Duplicate Content

1.) Canonical Tag

Using the canonical tag (rel=“canonical”) in the header of your site will tell the search engines which version of the site you would like to return for queries. This is the best approach to take when you have multiple versions of a website available to users. Self-referencing canonical tags

HTML Examples:

Syndicated content utilizing canonical tags on this Entrepreneur article:

<meta name=”original-source” content=”http://www.helpscout.net/blog/hiring-employees/” />

<link rel=”canonical” href=”http://www.helpscout.net/blog/hiring-employees/” />

<link rel=”amphtml” href=”https://www.entrepreneur.com/amphtml/237756″ />

Self-referencing canonical tags from a recent TransUnion SmartMove Infographic:


<meta http-equiv=”X-UA-Compatible” content=”IE=edge,chrome=1″>

<link rel=”canonical” href=”https://www.mysmartmove.com/SmartMove/blog/beginner-guide-owning-rental-property-infographic.page”>

2). Meta Tag

Meta tags (“noindex”, “nofollow”) are useful when you want to tell a crawler not to index a particular page on a site. These work best when you want a user to be able to access a website but not have that particular page indexed within the search engines.

HTML Example:

Source: Robotstxt.org

3.) Implementing 301 Redirect

A 301 redirect will redirect the legacy pages of one site to a new URL. It tells the crawlers to rank the redirected link for search queries as well as pass all the link authority to those pages. 301 Redirects are important in forwarding web traffic from one webpage to another.

Whether you are about to set up your business website, or taking a fresh look at an already established domain, do not forget about the negative impacts that duplicate content can have on your company’s online viability.

DON'T WAIT! ONLY 9 OF 50 SEATS LEFT! It's not a virtual event. It's not a conference. It's not a seminar, a meeting, or a symposium. It's not about attracting a big crowd. It's not about making a profit, but rather about making a real difference. LEARN MORE HERE

Sam Wheeler
Sam Wheelerhttp://www.inseev.com/
SAM is a graduate of Northwestern University and an expert in the digital space. Sam has spent the last 5 years working for both fortune 500 companies and startups, helping to improve their digital presence. When he is not behind the computer, you can find Sam surfing off the coast of San Diego.