Multiple document silos - where to start?

Virtually every customer of ours manages multiple document repositories. The document volumes stored in these repositories can get enormous; even smaller organizations can produce millions of documents, while the largest run to billions. 

What is common across them all is the desire to consolidate, and to gain more value from the huge volume of information sitting in documents across their units. Sometimes this is no more than a desire to find things quickly, wile in other cases the goal is to merge and integrate business process tasks with relevant and current information. In still other instances, the enterprise simply needs to reduce costs, do more with less.

Multiple repositories can come in many different forms, be they hundreds of SharePoint sites, a handful of massive ECM systems, or a combination of shared drives and Outlook folders. But they all represent the same basic problem. "I have the information I need, it's somewhere, but I can't access it or find it easily, let alone leverage its full value."

When trying to come up with an approach to improve these multiple repository situations people usually consider the following approaches:

  • Migrate everything into an Über Repository
  • Federate the management of the repositories in place
  • Start an information governance project
  • Work on Information Architecture/Taxonomy/Metadata
  • Take a federated search approach to the situation
  • Utilize API's and/or EAI to integrate at the back-end
  • Build a portal/mashup to integrate at the front end
  • Use BPM to integrate in the middle
  • Go the SOA route to deliver shared ECM services

In fact there are still more approaches you could take, and the options above are not mutually exclusive, but they are the most common.

What is not so common is taking an approach that tries to address the bad practices that created these situations in the first place.  Somebody once said, "It's OK to make mistakes, but not OK to make the same mistakes repeatedly." Yet that is what so many organizations do when it comes to managing information effectively.

Rather than jumping straight into one of the options enumerated above, I would advise you to take a step back and to first consider starting every information management project with a major cleaning exercise. 

For example, if documents have not been accessed in X period of time, let's be honest, they are unlikely to be accessed ever again, and in most cases there is no legal or regulatory requirement for you to hoard dead information. So take the chance to identify deadwood data, and get rid of it, preferably by a formal disposition process, or if you must then simply move it to cheap offline storage and "archive it."  But whatever you do, work toward a situation where only active and relevant information is sitting in your silos. Move or destroy information that is not.

For many organizations such a clean out delivers more value than the rest of the project activities put together. In some cases volumes of content get reduced by 80% plus, and as a result when you are browsing, searching, or mining content you are only accessing current, relevant information.  It also makes the job of consolidating, migrating, or integrating information silos much more worthwhile. 

This should always be your starting point: clean data. Otherwise you'll spend most of your time knitting together in one way or another an awful lot of useless files.

Other ECM & Cloud File Sharing posts

ECM Standards in Perspective

In real life I don't see ECM standards proving particularly meaningful, and you should see them as a relative benefit rather than absolute must-have.