Real Story Group. Make Better Technology Decisions.

Delivering fearless advice since 2001. Here's our story
What Real Independence means. Find Out

  • Schedule a Demo
  • Free Sample
  • Contact
  • Subscriber Login
  • Your cart is empty.
Sign up for our Newsletter
  • Home
  • Evaluation Reports
  • Premium Subscriptions
  • About
  • Blog
  • Buy Now
  • Recent Entries
  • Get Custom Feeds

 

 

 

Regli Theresa Regli

Follow Theresa on Twitter @TheresaRegli

Content cleanup in the former East Germany

26-Dec-2007

Tags: Component Content Management, Enterprise Search, Implementation, Information Architecture, , Endeca Information Access Platform, IDOL Server

There's no time like the holidays for catching up on back issues of The Economist (don't worry, we're baking cookies, too), and this morning I found myself engrossed by a tale of pattern matching. No, not pattern matching of snowflakes or Christmas knits, but of a set of documents ripped into 600 million pieces by East Germany's State Security Service (better known as the Stasi), back when the Berlin wall was being torn down and the mob was at the gates. The Stasi were afraid of documents falling into the wrong hands, so when the shredders failed, they frantically resorted to tearing up documents piece by piece. And you thought getting your enterprise search engine to pull off late-binding security was tough?

In a project currently underway at Berlin's Fraunhofer Institute for Production Systems and Design Technology, software is being used to find patterns in these millions of Stasi-created fragments of paper and re-assemble them, jigsaw-puzzle style. In going through the fragments, the software is grouping the scanned shreds of paper together by identifying patterns in handwriting, color, paper texture, even ink color. Then, once a group of related shreds is found, the software puzzles the papers together. In their haste, the Stasi actually helped this process quite a bit -- most of the fragments of the same document were found in the same bag. Or bucket. Category. Taxonomy facet, if you will.

Like enterprise search tools that perform some sort of text mining and subsequent clustering -- such as Autonomy, FAST or Endeca -- this software has the capacity to learn and refine what it puts together, identifying new content as more or less like the original items in the set. When it gets confused (such as when a document has distorted or torn edges), it refers the act of judgement to a human being. But what's especially interesting about this software is that it actually spawns slightly altered versions of itself that compete for computer time on the basis of success at finding matches. Now that's something I'd love to see from my local enterpise search vendor.

There's a few lessons to be learned here. First, this is a multi-year project with dedicated resources, which is more than most companies are willing to commit to their own document scanning and indexing efforts. Second, while pattern matching may seem like an exact way to search for things, there's always factors in play that require judgement and refinement -- be it subtle linguistic differences, synonyms, or even how someone happened to tear something up.

And finally -- although history will surely welcome the Stasi's carelessness -- you should never take content security and storage lightly. You may think content is "secure enough," until you realize just how good your new enterprise search tool is at indexing all your content, but how bad it is at tying into your ACLs and showing the right results only to those who should see them.

Now, why can't I get my snowflake cookies to all look exactly alike?

    Now Get the Complete Real Story

    Vendor Evaluations

    Learn the real strengths and weaknesses of major vendors from around the world, in our research stream.

Tweet

close x

Free Sample Request

  Digital and Media Asset Management
  Document Management (ECM)
  Enterprise Collaboration & Social Software
  Enterprise Search
  Portals and Content Integration
  SharePoint Ecosystem
  Web Content and Experience Management
 Send me bi-weekly tips and insights from Real Story Group.
Your personal information, including your e-mail address, will be held in the strictest of confidence and will never be shared with anyone.

Subscriber Log In


Remember Me
Forgot password?


Not a subscriber?
Learn about our subscriptions

Research Mentioned in this Post

Vendor Evaluations

 | 

Our Newsletter

Get the Real Story bi-weekly.

Have Questions?

USA & Canada
+1 800 325 6190

UK
+44 (0) 20 3318 1911

International
+1 617 340 6464


All Other Inquiries

Our Customers Say

"I wish I had found your Web CMS Research six months ago. The "Pitfalls to Avoid" section is worth its weight in gold!"

Georgeann Elliott Moss, Director of Internet Publishing, Dallas County Community College District

next More

Real Story Group

Follow us on:  RSS  |  Twitter  |  Facebook  |  YouTube

Evaluation Reports

  • Web Content and Experience Management
  • Digital and Media Asset Management
  • Enterprise Collaboration & Social Software
  • Document Management (ECM)
  • Portals and Content Integration
  • Enterprise Search
  • SharePoint Ecosystem

Premium Subscriptions

  • Research Streams
  • Advisory Papers
  • Vendors Evaluated
  • Schedule Analyst Consultation
  • Online Education
  • Configure a Subscription

About Us

  • Our Methodology
  • Our Team
  • Media
  • Customer List
  • Events
  • Consulting
  • Contact Us

Need Help?

  • Talk to an Expert
  • FAQs
  • Customer Support
  • Contact Sales Team
  • Help with your account

Copyright Real Story Group 2001 - 2012. All rights reserved.

  • Contact Us
  • Copyright Policy
  • Privacy Policy
  • Terms of Use

Log In

Remember MeForgot password?

close x
close x

All analyst firms claim to be independent or vendor-neutral. We're different.

Real Independence


Get the real story on commercial and open source tools from a firm that works only for you, the technology customer.

close x

Newsletter Signup

Thank you for signing up for The Real Story Group Newsletter. You will receive our monthly newsletter, plus updates with new information on the technology streams you have expressed interest in below.










Choose the streams that you’d like to receive updates for: