Formerly CMS Watch. Here's our story
What Real Independence means. Find Out
Kas Thomas
28-Oct-2009
Tags: Enterprise Search, IDOL Server, Lucene and Solr, Mindserver Enterprise Search
Anyone who's been watching the search space for a while knows that Apache Solr -- the popular open-source search server built on Lucene -- is the elephant in the room for a great many product-selection teams these days. It may be an exaggeration to say that most product-selection discussions begin with "What about Solr?" But not by much.
The release of Solr 1.4 will no doubt only intensify debates over the virtues of build versus buy and open source versus vendor lock. Suffice it to say, Solr 1.4 is jam-packed with enhancements designed to make a Lucene-based search system more performant, more scalable, easier to replicate, and more flexible as a development platform. Space precludes a full discussion of those things here (for that, you'll need to consult our Search and Information Access Report), but a couple of items deserve quick mention.
One of Solr's signal weaknesses has been its slavish reliance on XML as an input format for document ingestion. If all of your content happens to exist natively in XML form (or can easily be converted to same), chances are you'll be thrilled with Solr from Day One and can use it more-or-less off the shelf. On the other hand, if you're trying to index a repository full of Word and PDF documents, what then? It used to be that you were on your own.
Enter "Solr Cell" (a play on Solr Content Extraction Library, Solr CEL), a Solr 1.4 feature that uses the content-extraction capabilities of Apache Tika to parse common office document formats. With Solr Cell, you can fairly quickly set Solr up to ingest PDF, OpenDocument, Word, PowerPoint, Excel, RTF, ZIP, and other document formats. This is a welcome development indeed.
Another area in which Solr and Lucene have traditionally fallen short but are now set to make big strides is in display technology -- or to be more precise, facilitating the creation of displayable widgets. The big news here is AJAX Solr, a JavaScript framework that enables easy (or at least easier) design of search widgets that can be populated with JSON data obtained via AJAX calls. The AJAX Solr framework is actually a fork of SolrJS, which in turn was a Google Summer of Code project in 2008. AJAX Solr doesn't actually create display widgets, but it does the next best thing by providing AbstractWidget classes and hooks into your choice of any of the popular AJAX helper libraries from Dojo, jQuery, MooTools, Prototype, or even a custom library.
It turns out, Solr 1.4 and Lucene 2.9 are bringing a number of much-needed performance enhancements as well (some of them quite sophisticated). We'll provide more details to our subscribers. Let's just say, for now, that scaling a Lucene system no longer means the system has to slow to (ahem) a crawl.
Solr isn't all-powerful. It hasn't yet incorporated text-mining tools of the kind that would raise eyebrows at, say, Autonomy or Recommind, and there's work to be done in the relevance-ranking department. Nevertheless, progress over the past year has been swift. One wonders where Lucene and its satellite projects will take the industry over the next couple of years. Something tells me that any big-search-vendor CTOs who aren't thinking about such things now will have a lot to think about when they wake up from their naps.
Learn the real strengths and weaknesses of twenty-two major Search and Information Access vendors from around the world.
Get the Real Story bi-weekly.
USA & Canada
+1 800 325 6190
UK
+44 (0) 20 3318 1911
International
+1 617 340 6464
All Other Inquiries
"An excellent read for anyone needing to find the right Document and Records Management vendor."
Gerard Cawthorn, ECM Business Consultant
Copyright Real Story Group 2001 - 2012. All rights reserved.
All analyst firms claim to be independent or vendor-neutral. We're different.
Get the real story on commercial and open source tools from a firm that works only for you, the technology customer.
Thank you for signing up for The Real Story Group Newsletter. You will receive our monthly newsletter, plus updates with new information on the technology streams you have expressed interest in below.