The Death of Taxonomies, revisited

  • 13-Nov-2009

Earlier this year I caused quite a stir when I predicted the death of taxonomies. Taxonomists worldwide told me I was an idiot, nuts, completely delusional. Some were deeply concerned that their jobs were threatened, as if employers would change org charts based on my prediction. Others secretly told me they agreed.

Of course, as so often happens in these dark days of 140-character tweets, my prediction was often taken out of context. I had predicted the death of traditional, monolithic, and single-hierarchy taxonomies, as well as the death of what I’d call the typical turn-of-the-21st-century taxonomy project (which I did dozens of times, as a former taxonomist), where librarians and/or linguists spend a few months in an organization determining how enterprise content should be categorized, so content technology could use it optimally. This project would usually be followed by an even longer period when people would admire the taxonomy, nod knowingly, saying “that’s exactly what we need!” - but not tag anything, despite the roadmap and project plan saying they should. 

As 2010 fast approaches, I’ve never been more sure of my prediction. Metadata continues to be vital, but technology is constantly getting better at mining and organizing it. As an example, this week I visited three organizations in Paris using Sinequa (one of the vendors we evaluate in our Search & Information Access research) on their intranets. In an approach similar to Endeca’s, entity extraction and semantic analysis create multi-faceted categorizations by people, country, city, language, companies, and other topics. Most of the content was unstructured; no taxonomy or tagging projects were undertaken.

“In over a hundred categorizations, we only have found two small errors in the past year,” said one implementer, from one of France’s largest wireless service providers. “We refine categorizations, but it takes very little time,” said another implementer at a systems integrator. “We wouldn’t have undertaken an enterprise taxonomy project because we never could have spent the time and money to write scripts or manually tag everything afterwards.”

Taxonomists might decry this as foolhardy; the fact is these companies achieved the results they wanted and increased the productivity and efficiency of their knowledge workers. These examples are not to say the technology is perfect -- far from it. My point is to reiterate that taxonomists need to adapt and work with technology to improve the results of what they can achieve for enterprises.

Also, the title “taxonomist” should die – as it pushes people into the mindset of fixed hierarchies and navigations, despite over a decade of efforts to change that. Evolve the title, I say, into a metadata architect, or an information cartographer. Those are far more descriptive of what this role must be -- and for the more savvy, already is -- as we move into the next decade.

Metadata architects can no longer get away with being topic generalists, they must be specialists in the industry content they’re refining and understand the end-user: what are the specialized topics that perhaps aren’t contained in content, that can’t be extracted, that would make knowledge workers more efficient? Another customer I met with this week, a large French government agency, pointed out the main thing their search tool couldn’t extract meaning from was acronyms. “We had to make a list of all the acronyms we use,” said the IT director. “Once we spelled out the acronyms, what they stood for and their synonyms, the software worked beautifully.”

Taking taxonomies beyond what technology can achieve on its own is the metadata architect’s challenge for the next decade, because technology is at the point where it achieves what taxonomists were doing a decade ago.

For buyers of technology, remember that different entity extraction and search tools are often geared towards different kinds of content; we detail this in our Search & Information Access product evaluation research. There’s also higher and lower-risk scenarios for allowing technology to do more vs. less work. Legal firms should have more categorization checks from a metadata architect or a content specialist, less so than a news agency where topics are more wide-reaching and less fraught with risk if an end-user doesn’t find something.

As I’m based in Europe this quarter, I’ll be missing Taxonomy Boot Camp in San Jose, CA for the first time in several years. Last year, the opening session devolved into a debate as to how to define a taxonomy and a taxonomist. This year, I propose embracing a new era of metadata architects, ones that work with technology rather than be willfully ignorant of its inner workings. It’s only by studying the “how” of technology just as much as the “what” of the content that we’ll get to the next stage of content management, search, and information access.