Enterprise search: free as in free beer?

Searching information -- really, how hard can it be? So, why wouldn't you go out and get a search engine that's for free? Well, to stick to the analogy of "free beer," you might wake up in the morning with a headache, only to find your wallet gone.

Of course, I'm paraphrasing the definition of "free software". Richard Stallman's example is used to point out the ambiguity of the term "free" in the English language. With free software, "you should think of free as in free speech, not as in free beer." Nevertheless, you should be warned: both open source beer (now in version 3.3) and free commercial beer have the potential for leaving you with a bit of a hangover.

If you really think enterprise search is a simple commodity -- and I will only comment on that with the obligatory statement that readers of our Enterprise Search Report will probably know better than that -- getting a free product would be ideal to get your feet wet (albeit somewhat sticky). I get invited to BYOB enterprise search parties a lot, and usually come up with Apache Lucene, IBM Omnifind Yahoo! Edition, and Microsoft Search Server 2008 Express. Let's get a closer taste of each.

Apache Lucene. Lucene is open source, which you are free to use. The problem is, it's not a complete enterprise search product -- it's a "text search engine API." What you get is a Java JAR with the core functionality of a search engine. In typical hardcore Java developer understatement this is described as "you write the easy stuff, the UI and the process of selecting and parsing your data files to pump them into the search engine, yourself." To developers that doesn't sound too difficult -- it's a library they'd be able to use to create search functionality for many applications. As they embark on that journey, however, many will find out they'll have to become experts on enterprise search to get their implementation to perform basic tasks any Google user has come to expect. Index Word documents? You'll have to convert those to text first. Remove stop words or perform spell checking? You'll have to get some more jars to fit that in. And that familiar user interface isn't so easy to replicate, either.

Of course, there's a couple of more "pre-packaged," Lucene-based engines (such as Nutch and Solr), but they'll only take you so far on that long and winding road. There's some excellent examples of what you can achieve with Lucene, but many more of how hard it can be to get there.

IBM Omnifind Yahoo! Edition (or OY!E). The Google appliances have the Google brand behind them, which must have got the IBM people thinking the Yahoo! brand would be excellent marketing for their free-to-use search engine. In fact, it's neither IBM nor Yahoo's technology, but Lucene wrapped in other open source software. A few commercial bits thrown in create a product that's easy to install and run. It will actually do many of the things Lucene will make you work hard to accomplish: it comes with support for several languages and quite a few source content filters. For users, it looks like a regular web search engine; for admins, there's a nicely designed and intelligible interface. In short, it does most of the things a Google Mini appliance will do -- but for free.

So what's the catch? Well, the license (by the way, what license?) limits you to 500,000 documents and 5 collections. After that, you can "upgrade" to other Omnifind products. But since the technology across the Omnifind line-up is completely different, this is the same as starting from scratch, and you'll pay for the privilege. I've been critical of the limitations of Google's appliances in the past, and sure, the 50,000 document limit of the entry-level Google Mini is a lot less than OY!E's half a million. But that comparison isn't really fair, considering the fact the Mini actually comes with the hardware to run the queries on for a mere $2,990. And don't think you'll be able to run IBM's software on an old abandoned test server you have available -- OY!E will need more power than the single blade Google Mini or Thunderstone Appliance to match the performance. Tellingly, I wasn't able to dig up an example of an OY!E implementation to mention while researching the Enterprise Search Report (if you know of one, let me know).

Microsoft Search Server 2008 Express. Microsoft's free offering is basically the same software as the non-Express version, but then there's the seemingly innocent limitation: one server only. I wouldn't want to continue the theme of this post by saying this is akin to handing out free samples of beer to get you hooked; suffice it to say that if you start to run the Express version in a production environment, there will, no doubt, come a time when a single server won't be enough anymore. When you've come to rely on the solution, you'll suddenly have to shell out for the licenses. As I've said before, having a free lunch isn't necessarily a bad thing; just remember that you'll probably have to pay for the beer the lunch comes with.

So, this might all start sounding like advice your mother gave you: never take anything from a stranger, and certainly no free alcoholic beverages. Don't forget, however, that I'm Dutch, and I've certainly developed a taste for enterprise search. Free beer sounds too good to be true, but it could certainly get your party started; just remember to drink in moderation, and never, ever, drink and drive.

Our customers say...

"I've seen a lot of basic vendor comparison guides, but none of them come close to the technical depth, real-life experience, and hard-hitting critiques that I found in the Search & Information Access Research. When I need the real scoop about vendors, I always turn to the Real Story Group."

Alexander T. Deligtisch, Co-founder & Vice President, Spliteye Multimedia
Spliteye Multimedia

Other Posts