Recently I've been looking closely at content repositories. "Content Repository" is a generic term adopted in the CMS community to describe a database that stores web content; fully rendered web pages, raw blog posts, templates etc. Content repositories are generic databases which trade off efficiency for flexibility. There are quite a few alternative 'off the shelf' repositories out there, mostly either tightly bound into applications, or very loosely bound (usually serving XML). All of them have something you might want; none of them have everything.
What I really want is:
A simple repository specification that can be rapidly and efficiently implemented to provide local repository services on both client and server
The flexibility to link these services in the background using offline update agents
Independence from any specific language or server platform
Something that is reasonably efficient for traditional CMS applications
But what I really want is a platform on which to build new web applications really fast without too many compromises.
Some time ago the Java community identified the need for some standards in this area, and I couldn't agree more. The approach taken by the Java folks was to define an API, now enshrined in a standard, the JSR 170. There is even a CPAN module for perlites wanting to access the API, and a strong reference implementation (Apache Jackrabbit). Several commercial vendors sell JSR 170 implementations, and there's a dynamic community involved in the next revision of the spec. A number of open-source projects have layered JSR 170 compliant interfaces over their existing content repositories.
After playing with Jackrabbit for a while I found myself asking, why is it so darned complicated? The reference implementation is very heavy, but it has to be honest to the spec. The API is defined in two levels, effectively "read only repository" and "read-wite repository". Level 1 is extremely rich with features. To me it looks very much like an API designed by committee, where no-one got their idea voted down. Having said that, it's really an excellent API. So, how about picking apart the API and seeing what parts an implementation really needs to provide? I'm not proposing anything more than a complementary API that is:
Able to leverage all the wonderful work the java folks have done,
Language agnostic (not tied to Java, or any other specific language),
Simpler – a subset of JSR 170,
More amenable to implementation using commonly available tools and libraries,
Much less tied to XML/XPath
So, I have embarked on this voyage of discovery. I have an initial implemention of a content repository coded using perl, which leverages CPAN heavily. I've tried to stay true to the JSR 170 definition, except where language specifics dictate otherwise (e.g. Java uses method signatures that are not available in other languages). The goal is to produce a reference implementation of this cut-down API (working name WCR). If we feel it's powerful enough, then we'll move on to a high performance implementation in a proper language, and a web interface. But the key is the API, not the implementation. The idea is that an application using WCR can use a standard JSR 170 implementation, assuming a compatible interface exists.
The resultant content repository won't be tied to any specific purpose. It's possible that the first application of it will be a wiki, though.