Tagging your web content

31-Aug-2009

It's one of those elusive dreams of web content management: a completely metadata-driven publishing model. Especially when there's lots of content, and a variety of sites or channels targeting different audiences. Wouldn't it be great if content more or less automatically found its way to the right places? The same items appearing in all the right spots, without laboriously having to copy it or even attach it to a specific point in your website tree?

Here's an example that's been tried around the world with varying success. Say you're running the web presence of some medical organization. As such, you have information on how to deal with various diseases, both on a general level (hygiene and disease prevention) and very specific (what to do when a flu breaks out).

Suddenly there is an outbreak of a new disease; let's say the elephant flu. You could prepare a news bulletin, which would automatically appear on information portals for medical professionals, consumers, etc. -- all the sites targeted to specific audiences for which this news might be of interest.

Better still, since you've already built up a large repository of information, it would be easy to launch an elephant flu theme site: just define the kind of content you'd want in there, and hey presto, with one click you've got an entire site with all the information (http://www.allaboutelephantflu.org). Content specifically on the elephant flu, but also the more generic topics on how to deal with a disease or whom to contact.

You can imagine why this is a compelling concept. Which is probably why I've seen attempts in many different areas, ranging from media companies, governments, product marketing companies, and insurance companies.

But, I said, elusive dream. Many have tried (I count myself among them), and many have failed (unfortunately, I can't really discount myself entirely from that group, either). That's because there are three major problems when you've actually implemented the infrastructure to do it. Since this is only a blog post, I'll pick the most obvious one for now.

The content needs metadata for this to work. Many will tell you that "people won't tag." No, seriously, they won't tag content with the right labels, add the right metadata, or correctly categorize, "even if threatened with being fired." And even if they do tag, it will be haphazard and inconsistent.

This is a very real problem. But at the same time it's complete nonsense. Because if this were the case, why would people meticulously tag and file their holiday snapshots on Flickr and Facebook? Somehow, in their spare time, they do identify the people in a picture, add keywords to a shot, give it a meaningful title, and actually describe it. Without having to be threatened with being fired, or even having to be beaten with a stick.

Partly this is because they get the feedback that makes it worth their while to do so. If you identify your friends in a picture on Facebook, they (and then their friends) will immediately find it and start commenting, which creates a positive feedback loop to tag some more. More importantly though, it's really easy.

If you get back to work the next day, and have to laboriously click ten times, scroll, add, categorize, while thinking what the right category within the taxonomy would be, it all feels like an insubordinate amount of trouble to go through. In most WCM systems and implementations (and dare I say it -- most ECM implementations are much, much worse) it's just too much trouble.

Fortunately, I'm beginning to see some change. There are now quite a few ways in which CMSs can make it easier on your editors to identify the content they're producing:

Using a "free-for-all" folksonomy, where you can just quickly type in a few keywords. The problem of course, is that the tags will often be wildly inconsistent and ambiguous. Check a tag cloud near you for tags like "New York," "NY," "newyork," and of course, the typo that got away, "new yok." This can be made easier by type-ahead auto-completion of tags. Some systems will start listing suggestions as you type, and helpfully, with some, what you type doesn't have to be the beginning of the tag ("york" will also suggest "New York.") The auto-complete effectively normalizes the tags (i.e., at least all of them will be "New York.") It may still be ambiguous and inconsistent, but at least for many purposes, it'll be workable.
Using suggestions. Usually with the help of an embedded search engine such as Lucene, the system comes up with tags and related links for your content (based on similar content it finds.) At first, this will need quite a bit of training, but the great thing is that the more content is accurately identified, the better the system gets. The suggestions can be used to completely automate the process, but since you'll still have the original author at hand in the editing screen, you can take advantage of this and ask them to validate the suggestions as they are saving the content. That's a lot easier than having to think them up themselves.

It's still more common to see any folksonomy functionality smoothly integrated into Social Software, and auto-categorization or auto-classification is an area where Search & Information Access systems are usually way ahead. But a few web CMSs are making headway into this territory, as well. For GOSS, which has mostly customers in the UK government, its ability to suggest related content, and categorize it in the IPSV (the Integrated Public Sector Vocabulary) is a unique selling point, and the company has been honing this for the past few years. Hippo (who, incidentally, recently won a large Dutch government account, so there may be a pattern there) is working on releasing similar functionality this fall.

There are others with such capabilities; but at the same time, many are lagging. A system that hides keywords and categories on the fourth tab, three items down under "metadata," and then makes users jump hoops to enter the information isn't likely to help. And some products are so thin on metadata, no amount of customization is ever going to make it work for your users. Carefully check before you buy.

But the good news is that if you share this dream of at least partly automating a metadata- driven architecture, the metadata part of that dream can be realized. Of course, that means there are at least two other major hurdles to take -- but that's enough for now. I'll return to this topic again...

Real Story Group

Strong opinions. Candid advice.

Tagging your web content

Other Web Content & Experience Management posts

Future-Proofing MarTech: Join other Leaders in DC

WCM vendor research updates

Whither Sitecore Now?

[Webinar] Web Content Management in a Post-DXP World

TeamSite Marriage Counseling