Solr heads for an even sunnier future

Anyone who's been watching the search space for a while knows that Apache Solr -- the popular open-source search server built on Lucene -- is the elephant in the room for a great many product-selection teams these days. It may be an exaggeration to say that most product-selection discussions begin with "What about Solr?" But not by much.

The release of Solr 1.4 will no doubt only intensify debates over the virtues of build versus buy and open source versus vendor lock. Suffice it to say, Solr 1.4 is jam-packed with enhancements designed to make a Lucene-based search system more performant, more scalable, easier to replicate, and more flexible as a development platform. Space precludes a full discussion of those things here (for that, you'll need to consult our Search and Information Access Report), but a couple of items deserve quick mention.

One of Solr's signal weaknesses has been its slavish reliance on XML as an input format for document ingestion. If all of your content happens to exist natively in XML form (or can easily be converted to same), chances are you'll be thrilled with Solr from Day One and can use it more-or-less off the shelf. On the other hand, if you're trying to index a repository full of Word and PDF documents, what then? It used to be that you were on your own. 

Enter "Solr Cell" (a play on Solr Content Extraction Library, Solr CEL), a Solr 1.4 feature that uses the content-extraction capabilities of Apache Tika to parse common office document formats. With Solr Cell, you can fairly quickly set Solr up to ingest PDF, OpenDocument, Word, PowerPoint, Excel, RTF, ZIP, and other document formats. This is a welcome development indeed.

Another area in which Solr and Lucene have traditionally fallen short but are now set to make big strides is in display technology -- or to be more precise, facilitating the creation of displayable widgets. The big news here is AJAX Solr, a JavaScript framework that enables easy (or at least easier) design of search widgets that can be populated with JSON data obtained via AJAX calls. The AJAX Solr framework is actually a fork of SolrJS, which in turn was a Google Summer of Code project in 2008. AJAX Solr doesn't actually create display widgets, but it does the next best thing by providing AbstractWidget classes and hooks into your choice of any of the popular AJAX helper libraries from Dojo, jQuery, MooTools, Prototype, or even a custom library.

It turns out, Solr 1.4 and Lucene 2.9 are bringing a number of much-needed performance enhancements as well (some of them quite sophisticated). We'll provide more details to our subscribers. Let's just say, for now, that scaling a Lucene system no longer means the system has to slow to (ahem) a crawl.

Solr isn't all-powerful.  It hasn't yet incorporated text-mining tools of the kind that would raise eyebrows at, say, Autonomy or Recommind, and there's work to be done in the relevance-ranking department. Nevertheless, progress over the past year has been swift. One wonders where Lucene and its satellite projects will take the industry over the next couple of years. Something tells me that any big-search-vendor CTOs who aren't thinking about such things now will have a lot to think about when they wake up from their naps.


Our customers say...

"I've seen a lot of basic vendor comparison guides, but none of them come close to the technical depth, real-life experience, and hard-hitting critiques that I found in the Search & Information Access Research. When I need the real scoop about vendors, I always turn to the Real Story Group."


Alexander T. Deligtisch, Co-founder & Vice President, Spliteye Multimedia
Spliteye Multimedia

Other Posts