Formerly CMS Watch. Here's our story
What Real Independence means. Find Out
Adriaan Bloem
11-Jun-2009
Tags: Enterprise Search, Marketplace at Large, Google Search Appliance
Last week, I posted a highly critical comment on Google's marketing of the Appliance, version 6. My main qualm is that the hyperbole makes it very hard to understand what it actually is they're selling. What you get with a GSA is not exactly how it looks on YouTube (well, the box is, but not necessarily the internals).
Of course, in my quest to get you the real story, I'm not going to leave it at "press releases and documentation don't match up". The interesting bit is what the software is actually capable of; even more interesting is what customers are doing with it in reality.
For now, I'll zoom in on what made the headlines: the Appliance's new capability to index billions of documents, rather than the 30 million of previous version. I noted two things about this:
Google got in touch with me to explain this, and this led to two surprises.
First of all: Dynamic Scalibility is, in fact, the feature that would enable indexing billions of documents, and this isn't a beta feature. So what about the documentation's reference to a 30 million document limit? As it turns out: this is an error in Google's documentation. (For now, the error is still in the "Guide to Software Release 6.0", but I've been told this will be corrected.) According to Google, there is no hardwired limit to the number of documents you can index using multiple machines (as long as you buy lots and lots of Appliances to do it on, of course).
Secondly, about the difference between indexing 10-digit phone numbers or 40mb PDFs: I've been told that the Appliance's hardware is carefully over-spec'ed to handle the load Google claims it can deal with. (The Dell PowerEdge R710s the vendor ships would out-perform many commodity servers). My 40mb comment was a bit of a jab: an Appliance won't index documents larger than 30mb. But as Google explained, the limit has been set so they can guarantee that when they say a GB-7007 can index 10 million documents, it can actually index 10 million of those 30mb PDFs when that's what you need to do. And to be fair, if large documents are an issue for you, you'll want to read our Search & Information Access Report product evaluations carefully, since most enterprise search products have similar limits.
In the end, of course, the proof will be in the pudding: even if the software is capable of tying together 38 appliances to index a billion documents, this may not mean you'd actually want to. What are minor issues on a smaller corpus suddenly become major problems on that scale, and I'm looking forward to seeing how real enterprises are faring in deploying a cluster of GSAs for such high volumes.
And if anything: you still shouldn't believe the hype. Google's "billion document index" headline was syndicated across hundreds of news sources before even Google itself found out its documentation contradicted this. You'll want to be sure to get your information from a reliable source.
Get the Real Story bi-weekly.
USA & Canada
+1 800 325 6190
UK
+44 (0) 20 3318 1911
International
+1 617 340 6464
All Other Inquiries
"I wish I had found your Web CMS Research six months ago. The "Pitfalls to Avoid" section is worth its weight in gold!"
Georgeann Elliott Moss, Director of Internet Publishing, Dallas County Community College District
Copyright Real Story Group 2001 - 2012. All rights reserved.
All analyst firms claim to be independent or vendor-neutral. We're different.
Get the real story on commercial and open source tools from a firm that works only for you, the technology customer.
Thank you for signing up for The Real Story Group Newsletter. You will receive our monthly newsletter, plus updates with new information on the technology streams you have expressed interest in below.