Monday, April 25, 2011

Which Are The Common URL Related SEO Mistakes ?


1.   Lack Of Keywords
There still appears to be two camps on this matter: those who think using keywords in the URL string is of benefit, and those who don’t. I hereby grant you permission to quote me on saying that presently, in Google at least, yes this does indeed help your SEO efforts.
This is compounded by the fact that a recent Google update saw inner pages now ranking more frequently for certain search terms rather than the homepage, of which was historically ranked due to its higher overall authority.
With this in mind, any weighting and relevance you can give a URL to the search terms that you are targeting, the better.
Good URL Example: http://www.jessops.com/compact-system-cameras/Sony/NEX-5-Black-18-55-lens/
Bad URL Example: http://www.jessops.com/online.store/products/77650/show.html
(Sorry to pick on you here Jessops, but I use your site quite regularly and your URLs are a constant bugbear of mine!)
2.   Too Many Keywords
While having keywords in your URL string is a positive thing, too many may land you in trouble. While I haven’t come across any firm data sets that would show the negative correlation between keyword stuffing in URLs and ranking positions, Google did mention a while ago that their algorithm does in fact look out for this (Matt Cutts alludes to how it can “look spammy” in this webmaster help video). Even if it isn’t an overly strong ranking factor, the usability aspect of the URLs is hampered when you begin jamming them full of target phrases.
Good URL Example: http://www.example.com/hotels/USA/north-amarica/florida/orlando/
Bad URL Example: http://www.example.com/hotels/USA-hotels/north-america-hotels/florida-hotels/orlando-hotels/
This is a relatively benign example and far from the worst I have seen!
3.   Semantics and Directory Structure
Using a logical URL structure will not only help users figure out how all the pages relate to each other and how to navigate between categories, but search engine spiders also. An added benefit of using well-crafted semantics is that Google can pull these into a SERP and display them in place of a sometimes confusing URL string;
URL Snippit
4.   Dynamic AJAX Content
Moving onto something a little more technical now.
I have had a few new clients recently who were worried, as only their top-level category pages could be found in search engines. This meant that any long tail queries were not returning their sub-categories. Upon further investigation, it turned out that they were using AJAX to generate these pages with dynamic content using the hash parameter, and so these URLs were not being indexed.
Back in 2009 Google announced that it was making changes to allow these dynamic pages to be indexed. To do so, the exclamation mark token (“!”) needs to be added after the hash (“#”) within an AJAX URL.
Non-Indexable AJAX URL: http://www.example.com/news.html#latest
Indexable AJAX URL: http://www.example.com/news.html#!latest
5.   Canonicalise www and Non-www
Most webmasters nowadays handle this well, although it is usually by accident as their CMS does it for them. However, there are still handfuls that forget. I guess it can be forgiven in some cases though as developers, after toiling for months on making a site pretty and ensuring it does what it is meant to do, that checking both the www and non-www are dealt with correctly is far from their minds.
There are two issues here really, both of which have the same answer. The first is that the non-www URL is not pointing anywhere and returns a 404 ‘page not found’ error. The second issue is that the non-www URL could render the same as the www version – this would effectively create two exact copies of the same website.
The solution: ensure that the non-www version is 301 redirected into the www version, or visa verse depending on your personal preference!
6.   Secure HTTPs Pages
Another potential duplicate content issue that very often goes unnoticed is http and https pages rendering the same content. This usually arises either through sloppy web development using relative URLs within the website or through an automated CMS. It is most common when a user enters a secured area of the site with an https, then leaves the https page returning to the none secured http page ; however, the navigation retains the https precursor – usually because of relative URL links. This therefore results in the https being rendered on all pages of the site thereafter.
To combat this, two steps need to be taken. Firstly, all navigation to none secured pages must be http and not https. This can be achieved by either hard coding it or ensuring any relative URLs are removed from secured pages.
Secondly, in none secured areas, the https versions should be 301 redirected into the correct http versions.
7.   Category IDs
Many of these points are interconnected, and this one lends itself to the advice already given, regarding keyword inclusion and semantics.
Many sites utilise category IDs within their URLs, generated the majority of the time by their CMS. In a nut shell, a load of numbers, letters and symbols in a URL means absolutely nothing to either a human visitor or a search engine spider. In order to maximise the sites SEO impact, and meet the advice of including keywords within the URL and logical semantics, these IDs need to be turned into relevant descriptive text.
Many CMS platforms have this ‘pretty URL’ ability built in to them.  However, if this facility is not available, simply map each of the IDs to a relevant handle such as a products name or category.
Ugly URL Example: http://www.example.com/product.aspx?ID=11526&IT=5f7d3d
Pretty URL Example: http://www.example.com/dvds/anchorman-the-legend-of-ron-burgundy/
8.   Session IDs
Many ecommerce sites track visitors’ activities, such as adding products to shopping baskets, by appending session IDs to the end of the URLs. These IDs are necessary for visitors to interact with functionality that is user specific; however, they can result in dangerous duplicate content issues. As each ID must be unique to each visitor, this potentially creates an infinite number of duplicated website pages.
For example:
http://www.example.com/buy?id=2f5e2 and http://www.example.com/buy?id=4k3g1 will render individually, however potentially be exactly the same page.
The best way to combat this issue is to remove the session IDs from the URL string and replace them with a session cookie. This cookie will work in the same manner as the ID, but is stored on the users' machine and so will not affect the URL.
9.   The Trailing Slash Conundrum
This is another duplicate content issue but a very subtle one. Again, many CMS platforms cope with this very well out of the box, but you need to be aware of it just in case.
The duplicate content in this case comes from a website rendering URLs, both with and without the trailing slash.
For example:
Both http://www.example.com/catagory/product and http://www.example.com/catagory/product/ will render individually, however will be exactly the same page.
Correcting the issue is straight forward and can be fixed with a simple 301 redirect rule for all pages without a trailing slash pointing to the version with a trailing slash.
10.   Index File Rendering
A website will sometimes render both a root directory in the URL and the root appended with the index file (index.html, index.php, index.aspx, etc). When this happens, both get treated as an individual page by a search engine, resulting in both being indexed and creating duplicated content.
For example:
http://www.example.com/catagory/product/ and http://www.example.com/catagory/product/index.html will render individually, however be exactly the same page.
This is one of the most common oversights I come across on a daily basis, and is very simple to rectify. Similar to the trailing slash fix, a 301 redirect rule needs to be established to point one into the other. To allow for a greater level of usability I’d suggest redirecting the version, including the index page into the root directory URL without the index page.
BONUS TIP
11.   Subdomain URLs
This is not so much directly an SEO-related issue; however, it is one I felt I had to include in this list as it has caused me headache after headache, with one particular client last year, and I want you to avoid my pain!
I was browsing through Google Analytics and came across one particularly unassuming page of a client’s site that was generating vast numbers of page views. Delving into the analytics a little further, after several hours of hair pulling, I discovered that it was due to a page on the main domain being named exactly the same as a page on one of their subdomains. Due to the way subdomain tracking in Google Analytics works, it was under the impression that these two very different pages were in fact, one in the same.
For example:
http://www.example.com/page-one/ and http://sub.example.com/page-one/ are separate pages displaying different information; however the standard Google Analytics sub-domain tracking code it will register all activity from both as being on the same page.
So to avoid unnecessary hair loss, ensure that all URLs on both the main domain and across all subdomains are unique, or alternatively implement a series of custom filters.

Monday, April 11, 2011

What Is Google Panda Algorithm / Google Content Farm Update ?


Google tries to wrestle back index update naming from the pundits, naming the update "Panda". Named after one of their engineers, apparently.
The official Google line - and I'm paraphrasing here - is this:
Trust us. We're putting the bad guys on one side, and the good guys on the other
I like how Wired didn't let them off the hook.
Wired persisted:
Wired.com: Some people say you should be transparent, to prove that you aren’t making those algorithms to help your advertisers, something I know that you will deny.
Singhal: I can say categorically that money does not impact our decisions.
Wired.com: But people want the proof.
This answer, from Matt Cutts, was interesting:
Cutts: If someone has a specific question about, for example, why a site dropped, I think it’s fair and justifiable and defensible to tell them why that site dropped. But for example, our most recent algorithm does contain signals that can be gamed. If that one were 100 percent transparent, the bad guys would know how to optimize their way back into the rankings

Why Not Just Tell Us What You Want, Already!

Blekko makes a big deal about being transparent and open, but Google have always been secretive. After all, if Google want us to produce quality documents their users like and trust, then why not just tell us exactly what a quality document their users like and trust looks like?
Trouble is, Google's algorithmns clearly aren't that bulletproof, as Google admit they can still be gamed, hence the secrecy. Matt says he would like to think there would be a time they could open source the algorithms, but it's clear that time isn't now.

Do We Know Anything New?

So, what are we to conclude?
  • Google can be gamed. We kinda knew that....
  • Google still aren't telling us much. No change there....
Then again, there's this:
Google have filed a patent that sounds very similar to what Demand Media does i.e looks for serp areas that are under-served by content, and prompts writers to write for it.
The patent basically covers a system for identifying search queries which have low quality content and then asking either publishers or the people searching for that topic to create some better content themselves. The system takes into account the volume of searches when looking at the quality of the content so for bigger keywords the content would need to be better in order for Google to not need to suggest somebody else writes something
If Google do implement technology based on this patent, then it would appear they aren't down on the "Content Farm" model. They may even integrate it themselves.
Until then....

How To Avoid Getting Labelled A Content Farmer

The question remains: how do you prevent being labelled as a low-quality publisher, especially when sites like eHow remain untouched, yet Cult Of Mac gets taken out? Note: Cult Of Mac appears to have been reinstated, but one wonders if that was the result of the media attention, or an algo tweak.
Google want content their users find useful. As always, they're cagey about what "useful" means, so those who want to publish content, and want to rank well, but do not want be confused with a content farm, are left to guess. And do a little reverse-engineering.
Here's a stab, based on our investigations, the conference scene, Google's rhetoric, and pure conjecture thus far:
  • A useful document will pass a human inspection
  • A useful document is not ad heavy
  • A useful document is well linked externally
  • A useful document is not a copy of another document
  • A useful document is typically created by a brand or an entity which has a distribution channel outside of the search channel
  • A useful document does not have a 100% bounce rate followed by a click on a different search result for that same search query ;)
Kinda obvious. Are we off-base here? Something else? What is the difference, as far as algo is concerned, between e-How and Suite 101? Usage patterns?
Still doesn't explain YouTube, though, which brings us back to:
Wired.com: But people want the proof
YouTube, the domain, is incredibly useful, but some pages - not so much. Did YouTube get hammered by update Panda, too?
Many would say that's unlikely.
I guess "who you know" helps.
In the Panda update some websites got owned. Others are owned and operated by Google. :D

Thursday, April 7, 2011

What Is Link Baiting ?

Link bait is content on your site to which other sites link because they want to, not because you ask them to. Traditionally, links are hard to get, requiring you to sacrifice your first born child (and at least link back, which nullifies their value in some search engines.) But with link bait, you “bait” your content and sit back and wait. Of course, you can be a little proactive …

Link Bait Examples
Great content always serves as link bait. Breaking news often falls in that category, but so does an amazing ebook. A “How-to Guide” is another example.

Manners may buy you links. If you remember to thank your partners and competitors (or cite them), they will probably do the same for you, when the time arises.

Link bait could be a great gadget. All kinds of companies create calculators for specific purposes that become link bait. There are far too many mortgage calculators and “how much do you need to retire?” calculators, but how about a calculator that figures out what kind of reusable insulation you need in your steam room, based on pipe size? (Now that’s one of a kind, and is great bait.)

Link bait could be a widget. Widgets create a link from the site that uses it back to the site that created it – and there’s the link bait again.

Pictures are link bait, too. How about pictures on your blog of the latest industry event? You might even get links from your competitors, who want to show off their faces….

So, how does link bait help you?

It creates more links to your site, which help you in the search engines. Furthermore, these links come to you — you don’t have to get on your knees and beg for them.
It creates more links to your site, which send potential customers your way. After all, the whole purpose of SEO – coming up high in the search engines – is about reaching more people.