Future Proofing Your SEO: Parallel Indexing
Today I'm going to show you how to go about "future proofing" your SEO. Some call this "algo chasing," but the only similarity to algorithm chasing is that in future proofing, we learn something new or see a change in the results, review the cause and adjust the SEO strategy to position the site to move up in the flux that results. We never look for exploits! IMO, we should be looking for what we need to change and how to work within what we anticipate the algorithm will definitively show later.
Generally, if the change doesn't disrupt the current strategy drastically, then I will start implementing immediately. In this case there are some things I am doing currently that need more emphasis and some may require changes to the website development strategy. (Yeah ... I said website development strategy because for me that is the core of any SEO project.)
I have been doing SEO for longer than I care to admit, and there haven't been too many times in those years where a search engine has made me think "Hmmm, how or why did they do that?" I had one of those moments the other day watching a video from SMX Advanced. Why did I have a moment? Parallel indexing, baby! Google has removed one step in the indexing process and can now go from discovering content to including content in the results in literally seconds.
Google used to crawl the web, then process/index that content, and then update the index. At first they were able to add incremental updates, which replaced those annoying Google dances. Often these were referred to as back link updates, but they weren't—in reality it was all content that was updated. I think the "incremental updates" were/are the "textual segments" and the link graph is the last "segment" to be processed. So I guess calling it a back link update is to some degree true.
Think about it: Not a few years ago, SEOs were happy to get monthly updates. Now Google can put content in the results in seconds! That also means that buzz and social are crucial to strategies for digital asset managers, as universal SERPs and the creeping verticals leave less real estate for the old "10 blue links" content. Links are mostly important for referential query spaces. Like links in the reference/info query spaces, buzz and social sharing are the temporal validations of freshness and SERP worthiness!
Many SEOs don't get what this indexing change means. This new innovative indexing method has gone by with a ho hum "I'm more interested in knowing why Mayday is removing my thin content from the index." Easy! It is crap, dummy, and therefore only has "manufactured" citations to it! Using yesterday's techniques today has never been a good strategy, and the publishers affected by Mayday are the ones who seemingly got caught targeting long-tail terms with thin or user-generated content techniques, or, as it is known in the Dojo, crap hat SEO.
So now you have a situation where the SERPs are super temporal and buzz and social sharing are the initial measurements of quality/citations, so the fact that the links are nofollow means nothing. They are the QDF valuators and there you have it ... yet another link dampener.
Warning: Conjecture Ahead!
What I am about to share is pure conjecture on my part. It is conjecture based on the facts in the video below from the recent SMX Advanced, a video from Google IO 2010 and a little-known protocol called the Salmon Protocol, which was developed by Googlers. Warriors in the SEO Training Dojo community know that the Gypsy and I follow a regimen we call "future proofing" SEO. When we met, it was one of the traits we saw in each other that I personally seldom see in other SEOs. In fact, he was the first SEO I'd met who actually understood what I was talking about.
The video below explains how parallel indexing works, and if you don't pick up the significance of that, then the rest of this will likely not make a lot of sense. In particular, watch for the parts where Matt discusses the use of segmentation in the indexing method, because that is crucial to the theory behind my hypothesis.
Another "tell" is the industry chatter about contextual/embedded links. This chatter is always something I pay attention to because it points to a major algorithmic change. IMO, Mayday results are the combination of caffeine segmented indexing and a less weighted segmented link graph. In fact, I think PageRank is only important in a reference/info query space, i.e., reference content which oddly enough is where the premise for PageRank came from ... scientific papers.
So now we put all of this together and we see that with segmented indexing comes the ability to index the "template segments"—often the box that constitutes the template, which on many sites contains a large part of the internal linkage. I think these links could easily be dampened so that large sites depending on this juice for link integrity would get hit hard for long-tail queries.
In the case of a blog/CMS, segmented indexing could mean the only required "segments" needed for complete indexing are the comments and post data. The rest is already indexed. So in this case it could easily be added to the search results immediately if a full post RSS feed was set up. At the very least, Google could use the truncated post initially for discovery, and if the temporal universal SERP validators were tripped, a full indexing could be done. BTW this is similar to the algo that One Riot uses.
Another piece of the theory was discovered in another video from Google IO 2010 by the PubSubHubBub development team. Pay special attention to the information near the beginning, where the speaker informs the audience that when you ping Feedburner to come and pick up your RSS feed, it is also passed along to PSHB. Also keep in mind that any site can have an RSS feed and include that in Feedburner/PSHB, so what I am about to share is not restricted to sites with a CMS. Any site could add an RSS feed and ping Feedburner.
Lastly, the Salmon Protocol is the third piece that pulls it all together and theoretically could end the need for crawling. Salmon is a way for developers to manage all their content in RSS feeds. Currently all a publisher's information in RSS feeds (including blog comments) is only in a few RSS feed aggregators and social sites like Twitter. Comments are important because they are a piece of the content not accounted for in segmented indexing.
So what if the Salmon Protocol was actually a protocol to pull all the RSS feeds together? Then Google just has to add the template segment and voila, it has cut indexing considerably. What if in the RSS feed you could do partial and full versions of the post data and manage the other segments, comments, etc. through PSHB? Comments could be signed and spamming could be controlled, since the signed comment could be matched to an entity and therefore tracked more carefully and given more influence in the algo.
Future Proofing 101
In conclusion, I want to remind you this was an exercise in future proofing. I know what I will be watching for and how I can easily accommodate a change of this nature. Since an RSS feed looks like it is the future of discovery and quite possibly indexing, the first thing I will do is put a strategy and technology together so that sites not using a CMS with RSS can add these publishing features and techniques to the site. I will review the use of social media by clients and look for opportunities to implement more Universal SERP content to the sites.
Terry Van Horne is the founder of SeoPros and a 15-year veteran of Web development, currently working out of his consulting and development firm International Website Builders. Terry's interests are primarily the socialization of search and analysis of social Web traffic and applications like Twitter.