Dark Data provides Lucid moment as Big Data turf war heats up

  • 10-May-2012

WARNING: If you're already getting "Big Data" buzzword sickness, look away now, as Lucid Imagination -- the professional open source distributor of the Apache Lucene/Solr search platform -- has announced a beta program for its formal entry into the "Big Data" turf war. It's called, predictably, LucidWorks Big Data.

Built on top of their Lucid Works Solr distribution and extended using a range of additional Apache projects (including the Mahout "Machine Learning" engine), this platform allows beta-approved customers to build Cloud-based sandboxes to test their data sources to a satisfactory level of accuracy, without building up the necessary architecture in-house. 

Where this gets interesting is Lucid's reference to "Dark Data" -- their term for unstructured data -- and their acknowledgement that the vast majority of data retained within organizations is such dark matter. Much in the same way that IBM positioned the purchase of Vivisimo to perform a quality control or curation role as an entry point for unstructured data into a Big Data system, Lucid attempts the same with their LucidWorks Solr/Lucene distribution. Machine learning via Mahout adds the potential for some classification/categorization functionalities to be built into this curation process.  All caveats about machine learning still apply here.

As we mentioned last time, sandboxing is where many -- if not most -- customers are right now in their Big Data journey. Whether Lucid's beta program is suitable for these experiments will depend heavily upon your available skill sets to utilize the toolset.

Find the right fit with Realtime™

Save time, money, and grief

Build a Search vendor short-list

Compare specific Search vendors