Sunday, March 11, 2007

Open Publish Conference

I got to speak at the Open Publish conference in Baltimore (http://www.open-conferences.com/baltimore/) on Friday, 3/9/2007 amongst many of the content management and publishing luminaries. It was humbling. While the conference was small, it was very interactive and the audience was firing questions both during and after the talk. Kept me on my toes!

My particular talk was on topic maps, and on using topic maps as an information architecture to improve hypertext linking. The premise is that users frequently click on links embedded in web pages that take them somewhere they didn't want to go and that with better knowledge captured on the "aboutness" of the link, we (as content providers) can give the user more information and options about the links they are following (anyone interested can get the presentation slides from the conference directly or contact me and I'll email them).

My presentation was either fabulous or boring (depending on whether you're asking me or an attendee). But what I came away with were a bunch of questions about how to glean a topic set from extant content. I've spoken on the subject of topic maps on several occasions and many of the questions follow this same theme -- "How do we collect and organize the topics of the content in a meaningful way?"

I always have some lame answer -- "that's a job for the subject matter experts." While this is true -- the current state of the art is somewhat limited in our ability to parse content and determine subject matter, it really begs the question. Often (in my experience) there is something with which to begin this process. Usually, our content has already been touched in this way, from something as simple as the application of keywords to creation and maintenance of an outside topical index and the positioning of a particular content object within that. And often there are tables of content that can be used to further glean the "aboutness" of the content. What we (developers) need to consider is how to create a toolkit that both captures as much of this meta data as possible and imputes a relationship structure to it that in the least can serve as the foundation for a subject matter expert's work, and in the larger sense can provide a fully automated creation of a functional, integrated taxonomy.

At this conference, I was asked if I knew of any translation program that would convert the output of an index creation/management program (CINDEX) into a topic map -- a good example of what I'm talking about here.

I'm going to explore this further. We (Retrieval Systems) have a bunch of content processing tools, including some tools to work with the CINDEX data formats. Perhaps we can develop a reasonable toolkit for this kind of conversion.