Monday, March 19, 2007

Content Management Frameworks -- Tractare and Configuration

There are a large number of Content Management "Frameworks" out there -- http://en.wikipedia.org/wiki/Content_management_framework lists just a few. What is a "Content Management "Framework?" In essence, it is a programmable API for creating customized content management systems. In other words, a programmers way to create (using the framework) a CMS that supports all the stages of content lifecycle: Organization - Workflow - Creation - Repository - Versioning - Publishing - Archives (or some subset thereof). In essence, a CM framework is a foundation on which a custom CMS can be built.

The key features I would expect to see in a CM framework include (http://www.cmsreview.com/Features/Lists.html):

  1. a way to acquire both text and non-text content (acquisition, aggregation, authoring)
  2. a way to store and retrieve content
  3. a way to control workflow (roles/permissions, checkin/checkout, messaging/routing)
  4. a way to control versioning
  5. a way to control personalization and localization
  6. an interface to administration (reporting, management, etc.)
  7. a way to control content delivery (extraction, slicing, publishing, syndication, update)
  8. a way to implement business rules
  9. and others...
The trick is making this work as a framework. How much needs to be customized by a programmer and how much can be done by a very tech-savy non-programmer.

In our CM Framework (Tractare), we permit most of these features to be customized by scripting. For example, many organizations have IT departments that mandate the use of a particular DBMS (Oracle or SQL Server, for example). So in Tractare, there is a fairly simple configuration setting that allows selection of the database interface driver. But, since Tractare is a CM framework, we also expose the driver interface so that a programmer can create a driver for a database for which we have not supplied one. We make heavy use of configuration files so that non-programmers can customize the CMS.

Another area that we use scripting is in the interface elements (the "view" of the CMS). Internally, Tractare generates most interaction responses as an XML stream. This way, the entire user interface to Tractare can be tailored using XSLT scripts.

Depending on the application however, Tractare could still require an investment in programming. It is written in JAVA and intended to run as a web application (using servlets, etc.). So extending the code is a matter for a JAVA web programmer. But many features can be customized without this level of involvement.

Sunday, March 11, 2007

Open Publish Conference

I got to speak at the Open Publish conference in Baltimore (http://www.open-conferences.com/baltimore/) on Friday, 3/9/2007 amongst many of the content management and publishing luminaries. It was humbling. While the conference was small, it was very interactive and the audience was firing questions both during and after the talk. Kept me on my toes!

My particular talk was on topic maps, and on using topic maps as an information architecture to improve hypertext linking. The premise is that users frequently click on links embedded in web pages that take them somewhere they didn't want to go and that with better knowledge captured on the "aboutness" of the link, we (as content providers) can give the user more information and options about the links they are following (anyone interested can get the presentation slides from the conference directly or contact me and I'll email them).

My presentation was either fabulous or boring (depending on whether you're asking me or an attendee). But what I came away with were a bunch of questions about how to glean a topic set from extant content. I've spoken on the subject of topic maps on several occasions and many of the questions follow this same theme -- "How do we collect and organize the topics of the content in a meaningful way?"

I always have some lame answer -- "that's a job for the subject matter experts." While this is true -- the current state of the art is somewhat limited in our ability to parse content and determine subject matter, it really begs the question. Often (in my experience) there is something with which to begin this process. Usually, our content has already been touched in this way, from something as simple as the application of keywords to creation and maintenance of an outside topical index and the positioning of a particular content object within that. And often there are tables of content that can be used to further glean the "aboutness" of the content. What we (developers) need to consider is how to create a toolkit that both captures as much of this meta data as possible and imputes a relationship structure to it that in the least can serve as the foundation for a subject matter expert's work, and in the larger sense can provide a fully automated creation of a functional, integrated taxonomy.

At this conference, I was asked if I knew of any translation program that would convert the output of an index creation/management program (CINDEX) into a topic map -- a good example of what I'm talking about here.

I'm going to explore this further. We (Retrieval Systems) have a bunch of content processing tools, including some tools to work with the CINDEX data formats. Perhaps we can develop a reasonable toolkit for this kind of conversion.