This is a guest post by Terry Van Horne. Terry is the founder of SeoPros and a 15-year veteran of Web development, currently working out of his consulting and development firm International Website Builders . Terry's interests are primarily the socialization of search and analysis of social Web traffic and applications like Twitter.
This post was inspired by a discussion/questions in the SEO Dojo forums  and the Google Webmaster Central blog announcing microdata is supported for Rich Snippets . Microdata is part of the HTML5 specification. Over the years not much has changed IA (Information Architecture) wise.
The only real big change is the use of MRL (Machine Readable Languages) for distribution and use by Google's Rich Snippets. The Salmon Protocol  is an innovative protocol that could substantially reduce internet traffic, scrapers and other nonsense exacerbated by the bandage solution for opened arteries that Robots.txt is.
The discussion in the SEO Dojo was about "best practices" for IA and the difference between the IA of e-commerce and lead generation sites. Future proofing  my promotion strategy is always a chief concern for me. That's why I look at patents and technology instead of trying to reverse-engineer the ever-changing search engine algos.
Best Practices For Structuring Your Content
The best practices listed below are the keys to building out scalable websites that easily adapt to growing markets and products/niches. In the case of e-commerce it means the hardware and website can manage exponential growth in both traffic and products. Lead generation has fewer potential issues because generally there are less products and conversion pages.
- Hub and spoke design
- All content found within two clicks of home page
- Access the role of machine readable languages for IA, i.e., microformats and other protocols
- Build out IA for users not search engines
A classic website IA structure is based upon a primary page or hub that is used as a jump point for all related/themed material. An e-commerce site often has multiple hubs: specials, brands, categories, subcategories, often with both user-generated and in-house themed content in support/sale of many items.
One of the advantages of a hub and spoke IA design is that it makes it easy to cross-link related products/content, for a link architecture capable of providing two-click content discovery by users. Cross-linking related content with all related content linked to from the "indexing page" provides multiple ways for users to find what they are looking for.
In the past, the role of IA in indexing was limited to the link architecture (which is why hub and spoke gained early popularity), but with support from all major engines, XML sitemaps are now another consideration. IA plays a role in promotion especially for e-commerce sites where XML and RSS product feeds are common means of distributing information.
One of the problems I often encounter when doing site audits is poor IA as a result of efforts to raise content in the link architecture so search engines, in particular Google, will place higher value on the destination page. This only serves to increase the options for users, often resulting in confusion.
How does information architecture vary by site type (e-commerce, lead gen, etc.)?
- IA is determined by type of site
- Offer and calls to action (CTA) locations are part of my IA assessment/plan
- Content distribution solutions
The biggest difference between lead generation and e-commerce sites is the number of products offered. A lead generation site will generally have far fewer items to convert and therefore IA structure is flat. The IA structures for ecommerce is complicated because there is generally more content with varying types of related information.
I like to include CTA and offer locations in my information architecture plan. The reason for this is simple; I like these to be presented in multiple page locations and types (text/image/forms).
Increasingly IA is playing a larger role in content distribution and product visibility (promotion). MRL (machine readable languages) and Google-proposed Internet Protocols, may, if implemented, revolutionize content discovery as we currently know it. PubSubHubbub  is an engineer's greatest wish come true. Efficient content discovery with the advantage of "signing" it removes the possibility of litigation for using the content in any way the aggregator wishes, including displaying the content on their site. Think no click results.
How do you build it to scale?
There are two parts to building to scale. Your IA must be able to scale as the amount and diversity of content topics expand; however, following the best practices above your architecture is ready to scale to any size provided the hardware can cope with the extra processing and larger dataset.
Do you think in terms of general themes, or do you semantically structure your content?
When you build for users, IMO, it naturally results in a thematic structure. However, at the page level it is imperative to semantically structure the content to enhance usability. For instance using to segment the page topically provides enhanced scanability/usability of the page.
Using appropriate microformats improves usability for crawlers and other devices/clients by supplying a sort of meta data for the information. Although this is not used extensively by search engines, recent indications from patent submissions are that there is a definite trend to using more machine readable languages in their discovery and indexing.
How do you deal with cross-linking content?
I cross-link for users, meaning if I believe users of one area of the site would like another, it is poor design not to. Have I cross-linked for search engines? Yep, the technique worked for quite a while; however, it has been diminishing steadily since Florida, accelerated with Big Daddy and is indicating further decline as a result of Caffeine.
Basically, I could care less how my IA affects search engines because it's the search engines' problem to sort out duplicate content. That said, increasingly Google in particular is forcing me to design my information architecture to benefit them/me. But that's for another time.