Soft 404s and Search Engine Clues by Nathan Buggia

March 5, 2008 at 3:30 pm
filed under Simple SEO
Tagged ,

Here’s a quick recap from Nathan Buggia at Mix08. Great review Nathan – it was clean simple and a great session.

Crawling
What does the robots.txt file say?
Can we download this page? Are there links on it? How does this page relate to other sites about this topic? Is there anything unusual about this page?

Page is chunked into bits, stored separately. Images, meta tags, etc.

Ranking
Inbound links are key. They are the endorsement of what pages link to this page – ~20B pages in MSN.com for example.

Big arrows
Small arrows
Bad orange arrows – bad sites linking to you

Domain name structure – MSN treats msn.com/realestate differently than realestate.msn.com

Searching – what did the user enter into the search box? The engine looks at the 20B pages, RSS Feeds, structured content, etc and then orders them based on the search term.

Use HTML Semantically – simple is better. Using simple H1 and other basic html tags for best structure.

Proper Use of Common Tags: <a> <h1> <title> <meta> <frame> <table> etc..

Javascript – don’t lock up your navigation, etc behind JS as search engines don’t really index the text, etc.

RIA Strategy Guide – (Rich Internet Applications)
Monolithic – Best used when only the entry point is indexed, etc.
Linkable – full site indexed – metrics, reporting, etc.
Crawlable – Full site is indexed and well ranked, etc. Good example is finance.yahoo.com

Tips and Tricks
Search Engines can’t crawl ASP.net post backs, etc.

Make sure all your pages have the correct meta tags, etc. That is important for the teaser text in the search engine results across all the major search engines.

When redirecting old pages, be careful using client side redirects. It’s much better to do a server side 301 redirect than to use client side redirecting, etc.

Linking and URLs
store.com/products/microsoft_zune_80_G2 *better
store2.com/products/microsoft-zune-80-G2 (BEST)
store3.com/products.aspx?id=23weqwe (worst)

HTTP Status Codes
200 – Page was OK
404 – Page wasn’t found – be careful always using a 200 for a true 404 (“Soft 404″) – best practice is to 301 that 404 page – best solution is to 301 the page to say a newsletter page, etc.
500 – Internal Server Error
304 – not modified – basically a bandwidth saver :) must support conditional GET to utilize, etc.

Canonicalization
What’s the difference?
www.visitmix.com
visitmix.com
visitmix.com/default.aspx

The way DNS works is that each site can resolve to a totally different domains. Choose one option and stick with it – be sure to send all cases to the main site, etc.

Best Solution to avoid diluting your page rank is to 301 all non WWW requests to www Requests, etc.

no comments

RSS / trackback

respond