Scalability the Terracotta Way

One of the theoretical advantages of Java-based Portals and Content Management applications is the ability to cluster servers for better performance. But the reality is that clustering is a black art that few vendors and implementation teams really ever seem to master adequately. So it comes as a (welcome) surprise to learn of an open-source technology that delivers many (if not most) of the things customers want here, but in surprisingly quick, painless fashion, at low cost, with no need to recompile code or stay up nights learning about disturbing-sounding concepts like "STONITH" (shoot the other node in the head).

The technology in question is called Terracotta, and it works by clustering the Java Virtual Machine in such a way that even a participating JVM itself doesn't know that it has been enlisted in a coordinated effort of any kind. Through a clever bit of boot-time dependency injection, Terracotta patches a handful of core JVM memory-management bytecode instructions, achieving transparent virtualization across any number of enlisted VMs, under the control of a Terracotta server that lives in "aspect space." The Java memory model is not altered. Application code does not have to handle locks any differently or follow any special APIs, or even know that it's been clustered. Have I lost you here? Think of it this way: Instead of implementing special cluster services at the application level using product-specific APIs, Terracotta clusters the Java heap itself, underneath your applications.

It all sounds like science fiction until you try the tutorials, read the white papers and technical literature, and examine the long list of integration efforts (listed on the Terracotta website) involving other Java-based modules like Apache Lucene.

One of the more intriguing integration efforts thus far has been Geert Bevin's recent quest to achieve heretofore unknown levels of scalability and performance with the open-source Web CMS package, Drupal. Drupal is actually written in PHP, but in this case runs on Caucho's Quercus (a Java implementation of PHP), leveraging Terracotta in the cache layer. As Web CMS Report readers know, Drupal is a collaboration-intensive CMS solution of the "let's cache everything in the database" variety -- with difficult scalability problems to match. Bevin's system is highly experimental at this point, but it hints at what people might be able to accomplish with the technology.

In the meantime, other content technologies that take advantage of well-known Java subsystems like Hibernate, Tomcat, Resin, EHCache, Quartz, and so on have the most to gain by exploring Terracotta as a fast path to scalability. Individual subsystems can be tested against Terracotta separately, to find sweet spots.

It will be interesting to see how long it takes mainline ECM and Portal players (particularly those that rely heavily on Java-based infrastructure components) to include Terracotta in their "supported product configurations." I would expect the Alfrescos and Liferays of the world to stay out in front of the situation. Purveyors of complex proprietary solutions might miss the boat.

Scalability always has been (and probably always will be) the Achilles' heel of all the technologies we cover. I'll be watching to see how other communities adapt Terracotta-like notions to other well-known virtual machines (e.g., .NET). Anyone at www.mono-project.com listening?


Our customers say...

"The Web CMS Research is worth every penny!"


Gil, Partner, Cancentric Solutions Inc.
iStudio Canada Inc.

Other Web Content & Experience Management posts

Whither Sitecore Now?

It seems time for an answer to the question: what is Sitecore, really, circa 2023?

TeamSite Marriage Counseling

Some TeamSite implementations linger on, like a really bad relationship you can't seem to end. Maybe it's time for a clear exit?