The Semantic Web, SEO, and your CMS

Two weeks ago, schema.org was introduced: a joint effort my Google, Microsoft, and Yahoo that supports metadata, and a common vocabulary, for snippets on pages. This is more or less an evolution of Google's rich snippets, but now with broader support.

If you haven't heard of these before, basically the idea is that a search engine can identify specific elements in your page, such as people, products, events, or recipes. Because you mark up these elements with metadata within your HTML, a search engine can extract them and present them in a more meaningful way in its results pages. This is somewhat similar to features in Facebook's Open Graph protocol; however, Open Graph describes the entire page -- schema.org describes components on it.

The schema.org syntax is in microdata, a new (proposed) standard to go with HTML5, which has caused a bit of a hubbub in the RDFa community. While I can see the concerns -- and to some extent agree -- the whole argument is quite academic. If you're trying to get your content noticed by search engines, and want them to better "understand" what's on your pages, you'll have to go with schema.org's microdata. The semantic web is one of those old promises that never quite delivered: it needed support by the central hubs of the web to be meaningful. Those central hubs are search engines and social networks, and they dictate how you make your content readable to them. So it's time to be pragmatic -- if you care about SEO, you'll have to adopt the format proposed by Google, Bing, and Yahoo, and its (however limited) syntax and vocabulary.

However, if you're running a website and want to start getting noticed with your snippets, the exact standard used will be the least of your worries. You'll need to change your CMS templates (now would be a good time to check the relevant sections in our Web Content Management research to see which system will actually allow you the flexibility to add custom code as you see fit). But more importantly, if you're using a straightforward, page-oriented system, you'll now have to describe components within the page. This may turn into a nightmare for your editors, if it means hacking the tag syntax into the source code for each individual item within the page.

This will put those that have invested in a componentized data model and a CMS that can handle this at quite a bit of an advantage. If you want to mark individual products, offers, events, persons, etcetera -- those had better already be present in your system as separate entities, instead as part of a blob of HTML from a rich text editor. Again, now would be a good time to read up on the relevant sections in our research. That may sound as a bit of a cop-out, but the ramifications of this go well beyond what I can type up in a single blog post.

So while much of the online discussion is about the exact syntax, the pros and cons of adopting a brand new standard, and whether or not the search engine's schema.org water will mix well with Facebook's Open Graph oil -- that's not the problem in real scenarios. What you're going to have to deal with is a much more granular content model. If you, your team, and your system aren't prepared for this, your presence in search engine results pages will tank.

There's a lot to learn about managing granular, componentized content, and it's very different from simply publishing pages. With the search engines now pushing the semantic web, you're going to need to be able to get a handle on this sooner rather than later. It's time to start managing snippets, instead of blobs.


Our customers say...

"The Web CMS Research is worth every penny!"


Gil, Partner, Cancentric Solutions Inc.
iStudio Canada Inc.

Other Web Content & Experience Management posts

The Sitecore Paradox

The vendor's focus on core R&D and channel-based selling proved a winning business strategy, but I think Sitecore has hit a ceiling in recent years.