Real Story Group. Make Better Technology Decisions.

Delivering fearless advice since 2001. Here's our story
What Real Independence means. Find Out

  • Schedule a Demo
  • Free Sample
  • Contact
  • Subscriber Login
  • Your cart is empty.
Sign up for our Newsletter
  • Home
  • Evaluation Reports
  • Premium Subscriptions
  • About
  • Blog
  • Buy Now
  • Recent Entries
  • Get Custom Feeds

 

 

 

Thomas Kas Thomas

Recommind productizes its categorization engine

18-Aug-2009

Tags: Enterprise Search, IDOL Server, Mindserver Enterprise Search, Publishing-Media

Anyone who's been involved in a corporate-taxonomy project knows exactly how the terms "tedium," "tiresome," and "taxonomy" are related. Each derives from the other.

At some point, techonology should remove the need for taxonomy projects, even if it hasn't -- yet.

Help is on the way, though -- assuming you have, say, $150K (plus or minus a Toyota) to spend. Today, San Francisco-based Recommind, Inc. (one of the vendors we cover in our Search & Information Access Report) is introducing MindServer Categorization, a software system that does just what its name implies: It analyzes content, discovers logical categories within the content, and auto-tags each content item according to category relatedness.

Although it's being introduced today as a standalone product, MindServer Categorization -- technically speaking -- is not new. The product has been sold in Germany for years, where major media companies have used it to auto-categorize news feeds. Today's release represents the first time MindServer Categorization has been localized into English and productized for a general market (i.e., not just media firms).

Recommind is not the only company with auto-categorization technology, of course. (Autonomy, often seen on shortlists next to Recommind, is a familiar source of such technology.) But unlike others, Recommind uses PLSA (Probabilistic Latent Semantic Analysis) as a basis for category discovery, which means, among other things, that Recommind's software requires no training: It doesn't need to be exposed to a "training set" (or sets), have access to a preexisting taxonomy, nor know about keywords. In fact, MindServer Categorization is not only self-training but language-agnostic. In theory, the underlying algorithms can discriminate categories in any corpus, regardless of what language the corpus is in.

Exactly how efficient the system is, you'll have to determine yourself by testing it against a corpus or two of your own. The rate of false positives and false negatives will vary according to the characteristics of the corpus and the tuning parameters you specify. (You can relax or tighten the system's "strictness" through config settings and a C++ API.) Don't expect this -- or any other -- auto-categorization system to be perfect, or anything close to it.

Notably, although Recommind does a lot of business with law firms and legal departments, who use Recommind's search software to categorize e-mail (as well as do more sophisticated kinds of things, such as divining who the domain experts are, in an organization, based on correspondence), some customers are content just to have Recommind's software separate content into two categories: garbage, and content that clearly should be saved.

If it's true, as some research indicates, that 1 GB of data can cost up to $20,000 to collect, process, review, and retain, then the $150K entry fee for MindServer Categorization would seem quite reasonable. (Bear in mind, maintenance is another ~$30K per year on top of that.) But one wonders how long it will be before entity-extraction software, auto-taggers, RDF extractors, and the like become commoditized through Open Source. Also, Recommind and Autonomy face a different sort of competition from the likes of Thomson Reuters, whose Calais project provides what amounts to semantic analysis as a service. (While it's true you might not want to send your entire corporate e-mail archive over the wire to the Calais service, nevertheless you might very well want to stream select RSS or Atom feeds through it -- and many people already do, apparently.) 

At the moment, the company with the greatest exposure (and therefore the most to lose) in this field is Autonomy, whose IDOL technology has cemented the company's reputation for intelligent information retrieval. It will be interesting to see whether upstart Recommind can put a dent in Autonomy's semantic suit of armor -- or whether the two companies are, in fact, destined to remain in separate categories forever.

    Now Get the Complete Real Story

    Vendor Evaluations

    Learn the real strengths and weaknesses of major vendors from around the world, in our research stream.

Tweet

close x

Free Sample Request

  Digital and Media Asset Management
  Document Management (ECM)
  Enterprise Collaboration & Social Software
  Enterprise Search
  Portals and Content Integration
  SharePoint Ecosystem
  Web Content and Experience Management
 Send me bi-weekly tips and insights from Real Story Group.
Your personal information, including your e-mail address, will be held in the strictest of confidence and will never be shared with anyone.

Subscriber Log In


Remember Me
Forgot password?


Not a subscriber?
Learn about our subscriptions

Research Mentioned in this Post

Vendor Evaluations

 | 

Our Newsletter

Get the Real Story bi-weekly.

Have Questions?

USA & Canada
+1 800 325 6190

UK
+44 (0) 20 3318 1911

International
+1 617 340 6464


All Other Inquiries

Our Customers Say

"The Collaboration & Community Software Research is by far the most exhaustive and comprehensive attempt to understand and evaluate the landscape of social software with an eye toward helping enterprises make smart decisions I've ever seen."

John Eckman, Senior Director, Optaros Labs

next More

Real Story Group

Follow us on:  RSS  |  Twitter  |  Facebook  |  YouTube

Evaluation Reports

  • Web Content and Experience Management
  • Digital and Media Asset Management
  • Enterprise Collaboration & Social Software
  • Document Management (ECM)
  • Portals and Content Integration
  • Enterprise Search
  • SharePoint Ecosystem

Premium Subscriptions

  • Research Streams
  • Advisory Papers
  • Vendors Evaluated
  • Schedule Analyst Consultation
  • Online Education
  • Configure a Subscription

About Us

  • Our Methodology
  • Our Team
  • Media
  • Customer List
  • Events
  • Consulting
  • Contact Us

Need Help?

  • Talk to an Expert
  • FAQs
  • Customer Support
  • Contact Sales Team
  • Help with your account

Copyright Real Story Group 2001 - 2012. All rights reserved.

  • Contact Us
  • Copyright Policy
  • Privacy Policy
  • Terms of Use

Log In

Remember MeForgot password?

close x
close x

All analyst firms claim to be independent or vendor-neutral. We're different.

Real Independence


Get the real story on commercial and open source tools from a firm that works only for you, the technology customer.

close x

Newsletter Signup

Thank you for signing up for The Real Story Group Newsletter. You will receive our monthly newsletter, plus updates with new information on the technology streams you have expressed interest in below.










Choose the streams that you’d like to receive updates for: