Dark Data provides Lucid moment as Big Data turf war heats up

  • 10-May-2012

WARNING: If you're already getting "Big Data" buzzword sickness, look away now, as Lucid Imagination -- the professional open source distributor of the Apache Lucene/Solr search platform -- has announced a beta program for its formal entry into the "Big Data" turf war. It's called, predictably, LucidWorks Big Data.

Built on top of their Lucid Works Solr distribution and extended using a range of additional Apache projects (including the Mahout "Machine Learning" engine), this platform allows beta-approved customers to build Cloud-based sandboxes to test their data sources to a satisfactory level of accuracy, without building up the necessary architecture in-house. 

Where this gets interesting is Lucid's reference to "Dark Data" -- their term for unstructured data -- and their acknowledgement that the vast majority of data retained within organizations is such dark matter. Much in the same way that IBM positioned the purchase of Vivisimo to perform a quality control or curation role as an entry point for unstructured data into a Big Data system, Lucid attempts the same with their LucidWorks Solr/Lucene distribution. Machine learning via Mahout adds the potential for some classification/categorization functionalities to be built into this curation process.  All caveats about machine learning still apply here.

As we mentioned last time, sandboxing is where many -- if not most -- customers are right now in their Big Data journey. Whether Lucid's beta program is suitable for these experiments will depend heavily upon your available skill sets to utilize the toolset.