Delivering fearless advice since 2001. Here's our story
What Real Independence means. Find Out
Adriaan Bloem
23-Feb-2011
Tags: Web Content and Experience Management, Usability
I've been working with web content management systems for almost fifteen years now. And exasperatingly, I still see the same project problems recur constantly. Some of this is because of a lack of education -- it seems the field has grown a lot quicker than the general level of knowledge about the basics of content management. But a lot of it is just the same old technical problems.
Exhibit A: copy/pasting from Microsoft's Word.
Where does content commonly come from when it's repurposed for the Web? Microsoft Office, which is pretty much the standard for office productivity applications. In fact, it's quite usual for editors to send in their content as Word documents -- with webmasters or web managers diligently copying all the text, and pasting it into a rich text editor within a CMS.
Or rather, pasting it in Notepad, and then pasting it into the editor. Because what Word leaves on the clipboard is Microsoft's interpretation of what HTML should look like -- and that's quite a mess. Redmond's proprietary tags routinely break pages and standard layouts. And then there's the separate problem of content encoding -- those magic quotes often don't translate too well. In short, Word doesn't really separate content and design -- one of the basic tenets of content management.
Most systems nowadays have some sort of solution to this. Popular rich text editors like CKEditor and TinyMCE have buttons to either paste plain text only (the equivalent of the Notepad intermediary) or "clean" the Word content. Alternatively, your CMS may offer filters that will try to scrub the HTML after it is saved.
Cleaning, however, never quite works. Either too much gets stripped, so tables or more complex document structures don't make it across; or too little, leaving us with a bunch of tags with unpredictable results. All of this is difficult to get right. (I know this all too well, having once tried my hand at writing an XSLT filter for the purpose. The horror!) Unrealistic expectations here can lead to many help-desk calls -- "the CMS screwed up my document" -- and the like.
The reality is that the only reliable way to get text from Office to the web editor is "text only" -- forget any formatting. That's what the Notepad-route does; and it's what Google's Chrome browser now does with CTRL + SHIFT + V.
It's fair to say only Microsoft could really fix this. How hard would it be to just paste minimal markup, instead of proprietary lingo? This isn't exactly rocket science, cold fusion, or teleportation. So, I asked the company.
The problem for Microsoft, of course, is that while pasting into web applications is common, pasting from one Office document to another is much, much more common. In those cases, you'll often want to preserve formatting, and according to Redmond, "the HTML clipboard format in Word is optimized for those scenarios." What's more, there's now the Office Web Apps -- so Microsoft enables pasting into those web versions of the Office suite with all formatting intact, too.
That's all fair, but what about the web editor and her tedious clean-up process? Well, according to Microsoft, "[Y]ou can save your documents as 'Web Page, Filtered' where the extra markup will be removed and you will be left with a simpler set of HTML markup." Alas, even filtered HTML is not entirely MS-free.
So, there's a glimmer of hope, yet we remain pretty much were we've been the past decade on this problem. There is no single answer to something as simple as copying text from an Office document and pasting it into your CMS. Microsoft's solution is a bit cumbersome and incomplete, and Google's rips out tables and other content you may like to keep.
However, instead of blaming Microsoft for this, consider it a reminder. The trenches aren't glamorous, but it's where you're most likely to encounter hurdles. There are plenty more day-to-day obstacles to getting it right. And nobody's going to magically fix this for you any time soon.
Web Content Management Report looks at... Personalization Services in Escenic
"Personalization services are in transition. In version 4.3, Escenic had the Profile Web Service API for writing modules that store user preferences and user account data to allow for personalization. This API has now been superseded by the new REST API, but Escenic hasn't implemented a replacement for the profiling functionality yet. You should be aware that..."
(p. 660)
Learn the real strengths and weaknesses of major CMS vendors from around the world, in our Web Content and Experience Management research stream.
Learn the real strengths and weaknesses of 35 major Web CMS products from around the world.
Get the Real Story bi-weekly.
USA & Canada
+1 800 325 6190
UK
+44 (0) 20 3318 1911
International
+1 617 340 6464
All Other Inquiries
"I've seen a lot of basic vendor comparison guides, but none of them come close to the technical depth, real-life experience, and hard-hitting critiques that I found in the Search & Information Access Research. When I need the real scoop about vendors, I always turn to the Real Story Group."
Alexander T. Deligtisch, Co-founder & Vice President, Spliteye Multimedia
Copyright Real Story Group 2001 - 2012. All rights reserved.
All analyst firms claim to be independent or vendor-neutral. We're different.
Get the real story on commercial and open source tools from a firm that works only for you, the technology customer.
Thank you for signing up for The Real Story Group Newsletter. You will receive our monthly newsletter, plus updates with new information on the technology streams you have expressed interest in below.