La taxonomie est morte! Vive la taxonomie...

  • 12-Mar-2008

Conference events and tracks are getting nichey-er and more specific, and rightly so, as every year the knowledge we accumulate about content technology gets deeper and more nuanced, which begs more specific presentations and probing questions from implementers. Earlier this week in London, I attended an event on The Essentials of Meta Data and Taxonomy, and despite my having talked about this topic for 10 years now, I admit I was pleasantly surprised that there were over 100 registrants for such a specific event. The conference featured perspectives from implementers, vendors, and yours truly, the token analyst.

The theme set for the day was The Semantic Revolution, which many are already referring to as Web 3.0, when meaningful and relevant information will more readily be exchanged among systems and targeted to people in a very precise and way (without even specifying what you're seeking). Most claim that taxonomies and meta data are necessary to make it happen. As we note repeatedly in our Enterprise Search Report, search technology needs good categorization and metadata to perform well, and finding information is only one small piece of the Semantic Revolution. But, as this event demonstrated, people are still struggling with getting taxonomies and metadata in place: the tagging process and coming to agreements on controlled vocabularies is anything but easy. It requires serious business process modification and change management.

Meanwhile, a week earlier at the AIIM Expo, one of the panelists on the keynote CIO panel said, "don't spend time on a taxonomy. We did, and it was a waste." Twenty questions immediately popped into my head, wondering what they did wrong, wondering if they didn't involve end-users, or had a taxonomist that didn't work with subject matter experts, or tried to bite off too much at once, or perhaps built a relevant taxonomy but didn't know how to put it to good use? Or was it in a system somewhere, but no one bothered to tag the content? One day prior, during a long chat with Steve Arnold, he argued that none of this stuff -- meaning taxonomies and the technology that needs them to function -- will matter pretty soon.

While that may be the case for some future date, it's not the case now for business trying to find information today. Yes, text mining technology is getting better at extracting meaning from content and in turn categorizing or using it in a useful way, and one day my cell phone may just let my doctor know immediately if I'm having a heart attack. The technology exists now to be able to do that. But the car has also existed for over 100 years, and most of the continent of Africa doesn't have roads. Useful technology without infrastructure doesn't go very far.

I'm on the Eurostar train from London to Paris as I type this, thinking: who is Amtrak kidding that the Acela is "high-speed?" Superior technology doesn't always get deployed if there's bureaucracy and restrictions in the way, or when there's inconsistent standards from state-to-state (or system-to-system). The poor English woman next to me is paranoid about getting a Métro ticket out to La Défense during a possible strike, even though she carries a little piece of technology to translate a possible query for her, she's asked for my help. Why? Because the technology she has fails to adapt and handle the semantic nuances she needs to deal with an irregular situation. So does every single piece of content technology on the market today. I use all these examples because the best technology, or the ideal technology, takes a long, long time to be commonplace, and there will always be regulatory, organizational, financial, and personal barriers to their adoption. And this is why we still need taxonomies and metadata for today's technologies function, at least for the immediate future.

For now, content is stove-piped in multiple systems, and search has made people lazy. People think the answer should be as easy as a keyword. But the answers to our biggest findability questions are no more easily found by typing in a keyword than a non-French speaker might get a ticket on a working Métro line during a strike. Getting there is no easier than what Amtrak had to do to get the tracks laid down for Acela, and they still couldn't get the train to go as fast as it could have due to organizational and regulatory disarray.

One thing we often forget about search is that the answer to our question is not necessarily what's literally in the text of the document that answers it. Documents about the Eurostar might never have the words "high-speed train." This is where the technology still falls short, for now, and where taxonomies and metadata strategies fill holes. However, I don't think the technology will fall short for very much longer. Despite many years as a taxonomist, I agree with Steve: some day, content technology won't need taxonomies to function. Someday we will get accurate categorization automatically. Eventually, categorization may not really matter at all. But that time isn't now. The technology we all deal with today has a long way to go.