Lucene can read almost anything: Lucid and ISYS team up

A few months ago, I blogged about ISYS offering their document converter filters as a separate component. My thought was these would come in handy to add on to Lucene (which, by itself, can't actually read Microsoft Office files, let alone more exotic document types.) That would still leave you with a bit of DIY work, though: integrating the filters in your Lucene implementation.

As it turns out, Lucid Imagination had exactly that idea. The company, which offers commercial support for Lucene and Solr, is now offering it's own "LucidWorks" versions with the ISYS filters integrated. This means one of the gaps between open source and commercial search products has been bridged: with the filters, Lucene, too, can read over 200 file types.

According to Lucid, this has been one of the favorite doubts commercial vendors would cast over the open source search engine, and the move should level the playing field. However, as a customer, you should be aware that there's a couple of other things you may take for granted that are missing. Connectors to various content repositories, for instance, don't come with Lucene, not even a simple web crawler.

Still, the filters are a welcome addition, and they're certainly an improvement over what's currently available as open source. It's not just in the numbers: ask yourself how you think a converter will read a three-column Word document. You may be surprised to know that some will just go across all the first lines from left to right, then the second lines, etcetera. As always in Search & Information Access, the devil is in the details -- and knowing about these details will pay off.

The added filters aren't for free, but not exactly expensive, either. There's a 14-day trial, and you can get a subset (e.g., Microsoft Office) of the filters for as little as $3.250 for 2 years, or pay $10.000 for all of them (including those pesky legacy formats you'll discover in a distant corner of your fileserver when you least expect it.) That's still a long way off from the hundreds of thousands even a Google Appliance implementation may cost you in licensing. (Though there's no such thing as a free lunch or free beer with open source, either.)

So this is interesting news if you're considering Lucene, but what about ISYS? Aren't they selling the family silver? Well, let me wrap up this post by meandering off into history. As the (perhaps apocryphal) story has it, when the Dutch were at war with the Spanish in the 16th century, they were still selling cannons to their opponents. They figured they might as well make a profit out of it: the outcome would be determined by strategy, anyway.

Open source projects and commercial vendors, on the other hand, don't even have to be at war. And as with a Spanish Rioja or a Dutch Heineken, it's all about picking the right one for the occasion.


Our customers say...

"I've seen a lot of basic vendor comparison guides, but none of them come close to the technical depth, real-life experience, and hard-hitting critiques that I found in the Search & Information Access Research. When I need the real scoop about vendors, I always turn to the Real Story Group."


Alexander T. Deligtisch, Co-founder & Vice President, Spliteye Multimedia
Spliteye Multimedia

Other Posts