Let's start at the basics. You clearly have issues with duplicate  content. On large sites, that is nearly impossible to avoid. Even the  best of Content Management Systems doesn't solve every issue. At some  point either you or some other website links to a page that exists in  two places. For all the reasons pointed out in articles above, you need  to fix this. Should I use a 301 redirect, or the rel canonical tag?
Using the canonical tag is a Band-Aid, remember? It doesn't fix the issue. Sure, the search engines might use it to figure out that they are the same page and rank only 1 of them so you're not competing with yourself. You stopped the bleeding. Even though you used the tag, two distinct pages still exist. Since both are linked to from somewhere on the Internet, visitors will be visiting two different pages.
This means your Analytics data is splitting the data between pages. It doesn't realize that they are the same page. It can't read the canonical tag.
Let's take a look at it. I work on e-commerce websites. This is where duplicate content has been a huge issue for years. Sometimes a product sits in two different categories, like a tomato being both a fruit and a vegetable. (What's the consensus these days?) Ecommerce platforms like to know how visitors found the content, so that leads to:
myfreshproducestore.com/vegetables/tomato.html
and
myfreshproducestore.com/fruits/tomato.html
The tomato.html page is the same, just found in two different URLs. It should be noted that many of the most popular e-commerce platforms have solved this issue by not using URL path tracking. The custom platform all of our clients live on is still a little antiquated and hasn't fixed this issue yet. Out of 11 different sites on the platform that I checked, roughly 35% of the pages are duplicates. On 1 ecommerce shoe store, 78% of the pages were duplicates. Yep, that means on average each page has 5 different instances.
So why does rel canonical fall short? Take a look at the Analytics data:

Each of our product pages has a specific system number. In this case, p2 sits at 37 different URLs! Obviously, this is an extreme case, but it's equally important in the tomato situation where it resides at only 2 locations.
When I want to go look at page specific data, I can't just look at 1 URL. In this case, I have to look at 37. With the tomatoes, I have to look at 2 still. I can't take a quick gander at p2 and say, "oh, I need to do this…"
Of Long Tails and Landing Pages
One of the ways I look at long-tail traffic is through total number of keywords bringing traffic and total number of landing pages. Rand has also at times recommended this approach for discovering better indexation numbers. Look at how not fixing the real issue behind duplicate content can screw up that data:

The problem with that 2,792 number? There are only 500 products on the site and fewer than 100 category pages. How do I know which products are actually being indexed? Is Google indexing the same 400 products 7 times each and ignoring 100 of them?
If I dig down deep enough, crunch enough numbers, and dig through enough data I might be able to figure this out. If we had this duplicate content issue fixed it would be easy as pumpkin pie to tell if there is a problem.
In this case, canonical tags have not solved the problem. We started using them 3 months ago (after a lot of schmoozing of the tech department), and our landing page numbers are still absurdly high. This means Google is still not entirely able to figure out whether or not a tomato is a fruit or a vegetable or both.
Clearly, in our case the root of the problem is with our platform. Implementing canonical tags has seemingly reduced the number of indexed pages by about 25%. Whoever thought we'd be hoping for fewer pages indexed? Dr. Pete's case study shows that the canonical tag does work, at least when you don't want it to. It is a temporary fix, a band-aid. To solve our duplicate content problems we need to fix our platform and use 301 redirects on all the duplicate pages.
That would mean no more looking at 37 unique URLs to figure out the bounce rate of 1 product page.
No more spreading out short and long-tail keywords over 2800 pages instead of 600.
The rel canonical tag creates major issues with Google Analytics. It should not be your solution to duplicate content issues.
Using the canonical tag is a Band-Aid, remember? It doesn't fix the issue. Sure, the search engines might use it to figure out that they are the same page and rank only 1 of them so you're not competing with yourself. You stopped the bleeding. Even though you used the tag, two distinct pages still exist. Since both are linked to from somewhere on the Internet, visitors will be visiting two different pages.
This means your Analytics data is splitting the data between pages. It doesn't realize that they are the same page. It can't read the canonical tag.
Let's take a look at it. I work on e-commerce websites. This is where duplicate content has been a huge issue for years. Sometimes a product sits in two different categories, like a tomato being both a fruit and a vegetable. (What's the consensus these days?) Ecommerce platforms like to know how visitors found the content, so that leads to:
myfreshproducestore.com/vegetables/tomato.html
and
myfreshproducestore.com/fruits/tomato.html
The tomato.html page is the same, just found in two different URLs. It should be noted that many of the most popular e-commerce platforms have solved this issue by not using URL path tracking. The custom platform all of our clients live on is still a little antiquated and hasn't fixed this issue yet. Out of 11 different sites on the platform that I checked, roughly 35% of the pages are duplicates. On 1 ecommerce shoe store, 78% of the pages were duplicates. Yep, that means on average each page has 5 different instances.
So why does rel canonical fall short? Take a look at the Analytics data:

Each of our product pages has a specific system number. In this case, p2 sits at 37 different URLs! Obviously, this is an extreme case, but it's equally important in the tomato situation where it resides at only 2 locations.
When I want to go look at page specific data, I can't just look at 1 URL. In this case, I have to look at 37. With the tomatoes, I have to look at 2 still. I can't take a quick gander at p2 and say, "oh, I need to do this…"
Of Long Tails and Landing Pages
One of the ways I look at long-tail traffic is through total number of keywords bringing traffic and total number of landing pages. Rand has also at times recommended this approach for discovering better indexation numbers. Look at how not fixing the real issue behind duplicate content can screw up that data:

The problem with that 2,792 number? There are only 500 products on the site and fewer than 100 category pages. How do I know which products are actually being indexed? Is Google indexing the same 400 products 7 times each and ignoring 100 of them?
If I dig down deep enough, crunch enough numbers, and dig through enough data I might be able to figure this out. If we had this duplicate content issue fixed it would be easy as pumpkin pie to tell if there is a problem.
In this case, canonical tags have not solved the problem. We started using them 3 months ago (after a lot of schmoozing of the tech department), and our landing page numbers are still absurdly high. This means Google is still not entirely able to figure out whether or not a tomato is a fruit or a vegetable or both.
Clearly, in our case the root of the problem is with our platform. Implementing canonical tags has seemingly reduced the number of indexed pages by about 25%. Whoever thought we'd be hoping for fewer pages indexed? Dr. Pete's case study shows that the canonical tag does work, at least when you don't want it to. It is a temporary fix, a band-aid. To solve our duplicate content problems we need to fix our platform and use 301 redirects on all the duplicate pages.
That would mean no more looking at 37 unique URLs to figure out the bounce rate of 1 product page.
No more spreading out short and long-tail keywords over 2800 pages instead of 600.
The rel canonical tag creates major issues with Google Analytics. It should not be your solution to duplicate content issues.
 
 
No comments:
Post a Comment