Formerly CMS Watch. Here's our story
What Real Independence means. Find Out
Adriaan Bloem
16-May-2008
Tags: Enterprise Search, Open Source, Selecting Technology, FAST and Search 2010, Google Search Appliance, Lucene and Solr, OmniFind Enterprise Edition, Thunderstone Search Appliance
Searching information -- really, how hard can it be? So, why wouldn't you go out and get a search engine that's for free? Well, to stick to the analogy of "free beer," you might wake up in the morning with a headache, only to find your wallet gone.
Of course, I'm paraphrasing the definition of "free software". Richard Stallman's example is used to point out the ambiguity of the term "free" in the English language. With free software, "you should think of free as in free speech, not as in free beer." Nevertheless, you should be warned: both open source beer (now in version 3.3) and free commercial beer have the potential for leaving you with a bit of a hangover.
If you really think enterprise search is a simple commodity -- and I will only comment on that with the obligatory statement that readers of our Enterprise Search Report will probably know better than that -- getting a free product would be ideal to get your feet wet (albeit somewhat sticky). I get invited to BYOB enterprise search parties a lot, and usually come up with Apache Lucene, IBM Omnifind Yahoo! Edition, and Microsoft Search Server 2008 Express. Let's get a closer taste of each.
Apache Lucene. Lucene is open source, which you are free to use. The problem is, it's not a complete enterprise search product -- it's a "text search engine API." What you get is a Java JAR with the core functionality of a search engine. In typical hardcore Java developer understatement this is described as "you write the easy stuff, the UI and the process of selecting and parsing your data files to pump them into the search engine, yourself." To developers that doesn't sound too difficult -- it's a library they'd be able to use to create search functionality for many applications. As they embark on that journey, however, many will find out they'll have to become experts on enterprise search to get their implementation to perform basic tasks any Google user has come to expect. Index Word documents? You'll have to convert those to text first. Remove stop words or perform spell checking? You'll have to get some more jars to fit that in. And that familiar user interface isn't so easy to replicate, either.
Of course, there's a couple of more "pre-packaged," Lucene-based engines (such as Nutch and Solr), but they'll only take you so far on that long and winding road. There's some excellent examples of what you can achieve with Lucene, but many more of how hard it can be to get there.
IBM Omnifind Yahoo! Edition (or OY!E). The Google appliances have the Google brand behind them, which must have got the IBM people thinking the Yahoo! brand would be excellent marketing for their free-to-use search engine. In fact, it's neither IBM nor Yahoo's technology, but Lucene wrapped in other open source software. A few commercial bits thrown in create a product that's easy to install and run. It will actually do many of the things Lucene will make you work hard to accomplish: it comes with support for several languages and quite a few source content filters. For users, it looks like a regular web search engine; for admins, there's a nicely designed and intelligible interface. In short, it does most of the things a Google Mini appliance will do -- but for free.
So what's the catch? Well, the license (by the way, what license?) limits you to 500,000 documents and 5 collections. After that, you can "upgrade" to other Omnifind products. But since the technology across the Omnifind line-up is completely different, this is the same as starting from scratch, and you'll pay for the privilege. I've been critical of the limitations of Google's appliances in the past, and sure, the 50,000 document limit of the entry-level Google Mini is a lot less than OY!E's half a million. But that comparison isn't really fair, considering the fact the Mini actually comes with the hardware to run the queries on for a mere $2,990. And don't think you'll be able to run IBM's software on an old abandoned test server you have available -- OY!E will need more power than the single blade Google Mini or Thunderstone Appliance to match the performance. Tellingly, I wasn't able to dig up an example of an OY!E implementation to mention while researching the Enterprise Search Report (if you know of one, let me know).
Microsoft Search Server 2008 Express. Microsoft's free offering is basically the same software as the non-Express version, but then there's the seemingly innocent limitation: one server only. I wouldn't want to continue the theme of this post by saying this is akin to handing out free samples of beer to get you hooked; suffice it to say that if you start to run the Express version in a production environment, there will, no doubt, come a time when a single server won't be enough anymore. When you've come to rely on the solution, you'll suddenly have to shell out for the licenses. As I've said before, having a free lunch isn't necessarily a bad thing; just remember that you'll probably have to pay for the beer the lunch comes with.
So, this might all start sounding like advice your mother gave you: never take anything from a stranger, and certainly no free alcoholic beverages. Don't forget, however, that I'm Dutch, and I've certainly developed a taste for enterprise search. Free beer sounds too good to be true, but it could certainly get your party started; just remember to drink in moderation, and never, ever, drink and drive.
Get the Real Story bi-weekly.
USA & Canada
+1 800 325 6190
UK
+44 (0) 20 3318 1911
International
+1 617 340 6464
All Other Inquiries
"Your CMS research saved us money and efforts in identifying the right WCMS provider for our local market."
Eng. Moath Abdullah AL-Manayes, Executive Manager, Kuwait Neshami Company W.L.L.
Copyright Real Story Group 2001 - 2012. All rights reserved.
All analyst firms claim to be independent or vendor-neutral. We're different.
Get the real story on commercial and open source tools from a firm that works only for you, the technology customer.
Thank you for signing up for The Real Story Group Newsletter. You will receive our monthly newsletter, plus updates with new information on the technology streams you have expressed interest in below.