Carton's Content Blog: 2007

Friday, December 7, 2007

Offshore Development Woes

In his excellent article: "Using an Agile Software Process with Offshore Development", Martin Fowler summarizes his experiences in using offshore development for large projects. While his particular emphasis is on agile versus waterfall development approaches and the specific twists for offshore agile development, he makes a startling conclusion regarding offshore development: "Certainly anyone doing it because they think they'll get cost savings similar to the rate differences is seriously deluding themselves. "

This really hit home for me. We've been seriously examining our rates, partly because of our perception of downward pressure caused by the offshore "threat". But as I looked at it more, it seemed to me that there were several misperceptions going on here. First, it turns out that offshore arrangements we're competing with are under a contract for a minimum number of hours. Well, I'm certainly willing to offer a generous discount for a commitment like that! But more importantly, there is the hourly rate perception.

Just because the hourly rate is lower does not mean the project will be cheaper. I've always felt that way, but Martin Fowler supports it with evidence. Fact is, offshore development carries a large communications and distribution overhead that offsets the lower hourly rate.

He also says that "anyone who thinks that onshore developers will triumph because they are more skilled is very wrong. We've found that we can hire just as talented developers in India as we can in North America and Europe." And this has been my experience as well. The talent available there is excellent, though (IMHO) not better than we have here.

So, at the moment, it would seem to me to be a question, not of savings or talent, but of industry knowledge. The real savings should come about by using developers skilled and knowledgeable, or at least conversant in the industry area the development is supporting.

I say, the playing field is level, competition is good, let's sharpen our tools and have at it...

Thursday, November 15, 2007

Estimating Software Development

I'm an excellent software architect, developer, project manager, and even (as needed) sales person. But I can't estimate a project development effort to save my life. I waaay underestimate. Every time. I can lose money faster than a gambler in Las Vegas! So I figure I need to study the science (if such there is!) of estimating. In the past I've just thrown numbers into MS project and hoped for the best.

I was initially turned on to this idea of a systematic approach to estimating by one of the smarter folks I know, Norbert Winklereth, formerly of Omnimark and Stilo. He espouses an approach I had never heard of -- Wide Band Delphi Blind (WBDB). Being a believer in Agile methods, this appeals to me. I started reading up on it and I like what I see. It depends on several people looking at the whole problem (Wide Band), to develop a list of tasks and an estimate for each (The Oracle of Delphi) operating anonymously to reduce or eliminate political influence or pressure, (Blind). There's a fair writeup of it HERE.

There are a lot of parts to this, but the essence of it is to get several people working together to bring their collective expertise and experience to the issue of estimating an effort. And, it can be used to estimate a number of things, not just cost, but anything to which we can apply a unit of measure. One thing I began to see early on is that estimating the effort of a project and estimating the schedule are two completely different, though related things.

Then I read an article by Joel Spolsky (Joel on Software) on something he calls "Evidence Based Scheduling." Joel argues for (and, indeed, has a software product based on) creating schedules based on historical performance data (from time sheets or other records) combined with Monte-Carlo simultations. Again, recognizing the difference between effort and schedule, Joel makes a point that any task planned for more than 16 hours is not going to work and needs to be further factored. I think the same thing applies to estimating effort -- break it down into units of not more than 16 hours.

Another factor that Joel recognizes is the relative historical accuracy of an individual as an estimator - what he calls velocity, the ratio of an individual's estimate to the actual. A perfect estimator would always have a velocity of 1. A person's history of these ratios is his velocity history and is a factor in the Monte-Carlo simulation.

So, I'm thinking that a blend of these techniques may represent a reasonable estimating approach. We need to assemble a blind group, and a project leader who has and continues to build a velocity history for each blind participant. We need some Monte-Carlo simulation software, and maybe a tool to organize all this.

More to come

Friday, August 17, 2007

Agile and Budgeting

Scott Ambler has an excellent discussion of the relationship between, resources, and schedule in the "Iron Triangle" of software development and the need for flexibility in at least one of these areas to ensure project success. Scott concludes by outlining several scenarios in which one or more of the sides of the triangle can vary.

In my experience, the budget for a project is usually fixed, or at least proscribed to a certain upper boundary. So there are really only two sides of the triangle that can vary -- the scope and the schedule. Of these two, I find that the schedule is usually less flexible that the scope -- often there is a market-driven date establishing an outer boundary on the development process.

Which leaves us with the scope of the project as the only negotiable or variable side of the triangle. This leaves the development manager in a somewhat difficult position -- how to satisfy the user community (who's focus is on features) and also the client bursar (who's focus is on budget and schedule). The key to this is to develop the most important features first and leave the less important features for the end of development. This, in turn, requires a frank and realistic discussion of feature priorities. So, at the outset of a project, perhaps the most important thing the development manager can do is to get a solid list of the features and their priority.

Wednesday, July 18, 2007

OOXML - The Votes are In!

On Friday, July 13th, INCITS V1 took a role-call vote on our recommended position regarding DIS-29500, Office-Open XML. In what I imagine as a strictly partisan vote, we were not able to arrive at any consensus (a 2/3 majority). This, in spite of extending the time limit and even a certain amount of bullying pressure (or so it seemed to me) from some of the voting members. And the partisan division pretty much seemed to be Microsoft and their business partners, and the rest of the membership.

The motions we entertained were a) a recommendation of "Yes, with comments", which means we recommend acceptance of DIS-20500, and have comments that should, but are not required, to be addressed; "No, with comments", which means we recommend acceptance of DIS-29500 only if the comments are all addressed, and "Abstain, with comments" -- and I never did clearly understand that one! We did not entertain a straight-up yes or no vote.

I'm a little surprised by the outcome. Not so much that we could not agree on either a yes or a no, but that we could not reach a compromise position on abstaining. It seemed to me that the members voting yes were completely unwilling to meet in this middle position, voting down the abstain position along the same party lines. This unwillingness to compromise reinforces (to me, anyway) the feeling of being bullied -- the pro-OOXML membership essentially saying "my way or the highway".

It is of interest that in the past few months, 19 new members have joined this committee. And of those 19, 14 voted "yes". Certainly from my viewpoint, this looks like ballot-box stuffing.

Monday, July 9, 2007

OOXML -- What's Happening Here?

Lucky me, I'm a voting member of of INCITS-V1 -- the sleepy standards group that has been working on such arcane things as TopicMaps. We're now working through the 6k+ pages of DIS29500 -- Microsoft's proposed standard for Office Open XML (OOXML). And my first question is, why?

Now, I'm not invested in or in any way partisan to the so-called-competing Open Document Format (ODF) that is already a standard and is in the process of getting a new version standardized. ODF is a part of the open-office software suite. I only just recently learned that it existed at all.

And, by the way, I don't see this (as some apparently do) as a Microsoft v. IBM fight, though I do see it as a fight of some kind. For me, a "standard" should serve a couple of purposes: to promote interoperability and to enhance portability being chief among these.

So, why do we need two competing open document standards? The answer from the proponents of OOXML is that there is no way to augment ODF (the extant standard) to include the features of OOXML. This is partly because OOXML supports the Microsoft Office binary formats (from Office 97 forward, I think) and partly because "the process model is different".

I'm trying hard to understand this. If OOXML can only be implemented with knowledge of the internals of various versions Microsoft's office serialization formats, how does that promote either portability or interoperability, except, maybe, within Microsoft's product lines. And I just don't get the "different process models" response -- I would have though that a standard, especially an XML-based standard, would be above that.

To me at least, the issue is more subtle. If we (V1) approve OOXML as a standard and it fails to pass the litmus test of promoting portability and interoperability, I think we weaken our standards process as a whole, making standards less trustworthy. Kind of like what's happened to Wikipedia; a phenomenal web resource that has been co-opted by corporate marketing efforts (try searching for "Windows").

Tuesday, May 15, 2007

XML: Implications and Opportunities

Everybody's heard of XML and all the hype that comes with it. But what is XML? And what does it mean for publishers?

Often, content is authored, edited and published using a markup form that only controls style and presentation and not meaning. Examples of this are word processor files and web pages. XML offers the chance to capture the meaning of the content in the markup. Instead of this:

Allen, Thomas B. Vanishing Wildlife of North America. Washington, D.C.: National Geographic Society, 1974.

We have this:

<citation>
<author>Allen, Thomas B.</author>
<title>Vanishing Wildlife of North America</title>
<publisher>
<name>National Geographic Society</name>
<place>Washington, D.C.</place>
<date>1974</date>
</publisher>
</citation>

From that XML we could generate the same citation or any of a number of other formats. But, with this XML, we can also find our content with searches such as "author=Allen" and so on.

The key to understanding XML and it's implications for publishers is "Content Use and Reuse." Let's look at a scenario from a project on which I'm currently working.

Current awareness (news) content is published daily. It was originally written in a proprietary markup format, for which there were lots of in-house custom tools. All the editorial staff know the markup language. New hires have a long learning curve. We replaced the proprietary markup with an XML form. We hid the XML behind a word-processor interface and publish in a variety of forms (old and new) from the XML. This is feasible because of the open standards basis of XML and the freely available tools for managing XML. But really, it's just economics. Saved money. A publisher embarking on this kind of restructuring can use any of a number of standardized content models as a starting point: ODF, DocBook, DITA, etc.

The real opportunity comes in the new ways the content can be extended and also reused. More can be done to describe the content, down to a very granular level. The XML form of the content can contain not only the content, but metadata describing the content. This might not appear in any one published version, but can be used for searching and linking applications. It can be published in many ways on different media through simple reformatting.

Semantic markup is the next big step. We capture not only the metadata, but also describe the "aboutness" of the content. From simple keywords to linked thesauri, to external, linked ontologies, we can capture many levels of domain meaning for the content.

Back to the GWU Summer Publishing Institute...

So, getting back to the GWU Summer Publishing Institute, and the topic on which I need to speak, "Agile vs. Traditional: Methods for Building a Software Infrastructure."

It seems to me that, after exploring what Agile Publishing might mean, it's clear that an Agile Publisher needs a software infrastructure that supports the key tenets. I take that to mean a content management and publishing infrastructure that supports the tenets of Agile Publishing:

Customer satisfaction by rapid, continuous delivery of useful product
Working product delivered frequently
Working product is the principal measure of progress
Even late changes in requirements are welcomed
Close, daily, cooperation between business people and developers
Face-to-face conversation is the best form of communication
Projects are built around motivated individuals, who should be trusted
Continuous attention to technical excellence and good design
Simplicity
Self-organizing teams
Regular adaptation to changing circumstances

So what kind of framework is that? I'm envisioning a content management and publishing system that easily pushes out releases on a frequent basis, allows content and structural changes to be readily made, enhances communication between participants, enables remote meetings, provides templates for good design, has easy-to use process and product review tools, enhances workflow with loose rules, and can be easily adapted to changes.

This goes a ways beyond a content management system, or any publishing system on the market today because it demands flexibility and adds project collaboration tools.

In "The Emerging Art of Agile Publishing", XML.COM, 3/8/2006, Michael Fitzgerald makes some specific suggestions about how to be an Agile Publisher:

"Build real trust through constant, informal communication
Interleave work processes between writers, editors
Share work openly, don't store it in secret silos
Store source files in an online repository and give all members of the team access to it
Agree on and share tools to reduce the waste of format conversions
Carve up your work into small, easily consumed and exchanged pieces
Keep track of what you are doing daily with some other suitable tool
Publish your work early and often, charge a little something and get free reviews
Let Herb do things his own way, but not on your team"

I would add to that, "Keep it simple".

So what software is needed? First and formost, a communications tool. Especially in these days of telecommuting, getting face-time is hard. I like instant messaging and the Skype product in particular. Skype has a secure chat and telephony in one kit. Second, a content management system that will permit easy and frequent releases of product. There are many of these, including several open source ones. Third, open project management. The basecamp.com approach seems nice as does the offering from project.net. I'm personally in favor of an open project document repository for sharing project related materials. This could also be used for openly managing project management materials, such as an MS project plan.

Wednesday, May 2, 2007

Agile Publishing -- Continuous attention to excellence, good design, and simplicity

Principles Number 8 and 9 of the Agile Software Development approach . What do these mean for an Agile Publisher?

As with some of the previous principles, this is partly just good business and I feel these are all closely related. Simplicity => Good Design => Excellence.

Especially when developing a complex product with many content components, it is easy to design an overly complex, difficult to build and maintain product. Simplicity is the name of the game here. Also, good design -- keeping sight of both the immediate need and the growth room and designing with these in mind.

Tuesday, May 1, 2007

Agile Publishing -- Projects are built around motivated individuals, who should be trusted

Number 7 of the Agile Software Development approach . What does this mean for an Agile Publisher?

It's all about trust. This also could be said of every project everywhere. There's nothing different here for an Agile publisher than for any other development effort. Michael Fitzgerald ("The Emerging Art of Agile Publishing", March 8, 2006, XML.COM) posits these tenets:

People must be trusted
Fewer but more competent people are needed
Organizations must live with the decisions developers make

These are key points. Trust inspires dedication, dedication reduces numbers of people. But (I hear you cry), organizations must live with the decisions developers make? Hard to swallow, that. But remember, a key point of the Agile approach is that the development team includes stakeholders. It is partly their role to ensure that the decisions the developers make are in the best interests of the product and the client. A better product will come of it.

Friday, April 27, 2007

Agile Publishing -- Close, daily, cooperation between business people and developers and Face-to-face conversation is the best form of communication

Close, daily, cooperation between business people and developers and face-to-face conversation is the best form of communication -- Principles Number 5 and 6 of the Agile Software Development approach . What do these mean for an Agile Publisher?

It's all about communication. This could be said of every project everywhere. There's nothing different here for an Agile publisher than for any other development effort. Michael Fitzgerald ("The Emerging Art of Agile Publishing", March 8, 2006, XML.COM) posits these tenets:

The culture of the organization must be supportive of negotiation
Organizations need to have an environment that facilitates rapid communication between team members

Bravo.

Agile Publishing - Even late changes in requirements are welcomed

Even late changes in requirements are welcomed -- Principle Number 4 of the Agile Software Development approach . What does this mean for an Agile Publisher?

As the publishing world changes to include more and diverse assets in a publication, we introduce the possibility (or even probability) that changes will occur during the product development life-cycle. These could be as simple as changes in artwork or as complex as an overhaul in related assets.

One of the significant features of the iterative nature of Agile development is that the stakeholders (client reps) can see if the product is taking shape as was expected. If not, changes are not only possible, but encouraged, at just about any point in the development cycle. This same benefit obtains for changes to the underlying business model -- should the market conditions for the product change, we can respond to these changes.

Retaining this ability to accept changes to the requirements, no matter when they occur in the development cycle is a pre-eminant feature of Agile and applies equally to Agile publishing as it does to Agile software development.

Monday, April 16, 2007

Agile Publishing -- Working product is the principal measure of progress

Working software is the principal measure of progress -- Principle Number 3 of the Agile Software Development approach (http://en.wikipedia.org/wiki/Agile_software_development). Let's change this to "working product is the principal measure of progress" and explore what this might mean for an Agile Publisher.

Norbert Winklareth, one of the smartest people I know, asserts that the measure of value for which customers pay for is not any traditional metric of development but features:

Value => Functionality That Works
Functionality => Set of Features
Therefore: "Features are the true measure of development"

It would seem that the same principle can apply to a publishing project -- a publication has, among other things, a set of features. In software, features are specific characteristics, such as a search function or an RSS feed. In publishing, features could be harder to describe. In essence, these are the characteristics of a product for which customers pay. As a practical matter, features could be publication components (or assets) for example, written text, artwork, multi-media software functions, summaries or abstracts, and so on. Each of these can be considered as a feature of the published product.

Deciding on the priority of the features of our product is important here. By developing the most important features first, we can plan the creation of our product in stages (or iterations) in which we get our most desirable or most important features built first. The result is that we have a working, functional product at the end of each iteration that could be considered a final product if need be. It certainly permits us to respond to budgetary or other outside business pressures and still end up with a publishing product.

It also provides us with a true metric for progress. As features are developed, we can measure that easily against our plan for the product. We then know how well our project is fairing.

Saturday, April 14, 2007

Agile Publishing -- Working Product Delivered Frequently

Working software is delivered frequently (weeks rather than months) -- Principle Number 2 of the Agile Software Development approach (http://en.wikipedia.org/wiki/Agile_software_development). What does this mean for an Agile Publisher?

One of the key issues that arises in traditional software development is that the client's vision of the features and functionality of a product are different from the developers' vision. Another is the inability of the stakeholders to control development against their budget. These two issues are at least partially addressed through the use of Agile's frequent delivery of working software principle. Specifically, by getting frequent releases, clients can see for themselves how conformant the developers' vision is to their own. Also, and especially when combined with another concept -- planning for priority features in the early releases, should the development cost begin to approach the project budget, the project can potentially be terminated early with much of the priority feature set in working order. Powerful stuff.

Developing a publishing product can follow a similar flow to software development, one in which iterations of content are created and reviewed. This is especially true for products with content derived from many sources. Following a similar flow of development, we can exert the same control -- create iterations of working versions of the product and with more important features (included content sources perhaps) developed first.

In the realm of electronic publishing, it is almost always true that a product is derived from many sources and I've seen many publishing efforts held up in order to include a particular content asset. I wonder if that asset was important enough to the product to incur the delayed release and resulting revenue.

Wednesday, April 4, 2007

Agile Publishing - Continuous Delivery of Working Product

Customer satisfaction by rapid, continuous delivery of useful software -- Principle Number 1 of the Agile Software Development approach (http://en.wikipedia.org/wiki/Agile_software_development). What does it mean for an Agile Publisher?

To my mind, this is one of the most important pieces of Agile. This is what ensures that what the customer envisions is what is being built and that if not, corrections/realignment can be done early and at the least cost. It has some secondary advantages as well: regularly tracking progress against a budget; early start on customer training; and easier collaboration with client/stakeholders, to name a few.

What does this mean to electronic publishers? Publishing describes a large, complex industry that delivers information to a user community. There are many forms of electronic publishing: newsletters, online books, references, electronic databases, and so on. There is a commonality to software development in that published products (or assets) go through a development/authoring process, a QA (or editing) process and delivery.

An Agile Publisher would apply the principle of "rapid, continuous delivery" to each electronic product. In practice this could mean author submissions of partial manuscripts, developer delivery of database subsets, and so forth. Any of these early and frequent data feeds provides a foundation for "rapid, continuous delivery", which the permits the goals of early and ongoing review and realignment, budget tracking, etc.

But, to take advantage of this, the electronic publisher must also have a deployment infrastructure that will create a working version of this electronic asset. In other words, the software and processes must be in place to publish the early edition of the assets.

Tuesday, April 3, 2007

Agile Software Development v. Traditional "Waterfall"

I've been involved in two projects recently in which I got a first hand opportunity to compare the results of software development under Agile and under the traditional "waterfall" method. The two projects were not really similar in any way, but they still bear some scrutiny for the results of the two different approaches.

The project done using a tradition (or waterfall) methodology (which I'll call "Project W") began with a requirements gathering phase in July/2004. This was followed by an architecture and design phase, programming, testing and acceptance testing. We will (hopefully) deploy this system to customers in June, 2007, nearly three years after the project started.

The project done using Agile (which I'll call "Project A") was started in August, 2006 and was delivered and live in production in December 1006, around 5 months after the start.

Now, there are some very big differences in these two projects, including the complexity of the underlying data models and in the overall software architecture. However, it is interesting to note that a big part of the 3-year time to deliver on Project W came from two factors: redesign of system components that were found to be faulty during programming, and under-developed requirements. Especially in the past six months, we have spent a significant amount of developer resources adding/changing/deleting user features long after the requirement were solidified. In contrast, on Project A, at the point where the software was ready for testing, there were few feature changes.

Why? A significant aspect of Agile is that the client/user/stakeholder "sees" working versions of the software early on. So if there are any mis-understood requirements, these are found and fixed very early in the development life-cycle. Also, in Agile there is a much more limited design
process, relying instead on developing working code. As a result, any design errors were uncovered earlier in the cycle.

An interesting thing happened on Project A -- we exceeded the budgeted costs. Although this was conveyed to the customer as it happened, it still caused a great deal of consternation. However, and in spite of that issue, I view the project as a huge success. Had we followed a "waterfall" paradigm, it would likely have cost the same or more and would have taken longer to complete. So although we missed our target budget, we delivered a market-ready product in five months, enabling the customer to begin generating revenues quickly.

I'm totally convinced that Agile is superior for the kinds of projects on which I tend to work.

Monday, April 2, 2007

Agile Publishing

I've been invited to speak at George Washington University on the subject of Agile Publishing. This is a new application of Agile programming principles applied to content rather than to code. So I'm starting to explore the relationship and applicability.

The basic tenets of Agile Software Development (taken from http://www.agilemanifesto.org/) are:

Individuals and interactions over processes and tools
Working software over comprehensive documentation
Customer collaboration over contract negotiation
Responding to change over following a plan

And Agile at work in software development usually means following these principles (http://en.wikipedia.org/wiki/Agile_software_development):

Customer satisfaction by rapid, continuous delivery of useful software
Working software is delivered frequently (weeks rather than months)
Working software is the principal measure of progress
Even late changes in requirements are welcomed
Close, daily, cooperation between business people and developers
Face-to-face conversation is the best form of communication
Projects are built around motivated individuals, who should be trusted
Continuous attention to technical excellence and good design
Simplicity
Self-organizing teams
Regular adaptation to changing circumstances

So, how does this apply to publishing? Well, there are both philosophical and practical answers. Michael Fitzgerald published an article on XML.COM on this at http://www.xml.com/pub/a/2006/03/08/agile-publishing.html. In it, he examines the basic tenets of Agile in a publishing context and makes some specific suggestions on how to become agile. I'd like to extend his work. In the next several blog entries, I'll take a look at each of the principles in a publishing context. But first, let's look at what makes for a successful Agile project:

The culture of the organization must be supportive of negotiation
People must be trusted
Fewer but more competent people are needed
Organizations must live with the decisions developers make
Organizations need to have an environment that facilitates rapid communication between team members

I believe this holds true for an Agile publishing effort as much as for an Agile software development effort. These key elements center around trust and communication. Find good people, empower them to success, trust them to get there and communicate regularly. It may mean compromising certain aspects of traditional publishing, such as corporate style oversight, high-overhead decision-making, and trusting decisions to workers.

Monday, March 19, 2007

Content Management Frameworks -- Tractare and Configuration

There are a large number of Content Management "Frameworks" out there -- http://en.wikipedia.org/wiki/Content_management_framework lists just a few. What is a "Content Management "Framework?" In essence, it is a programmable API for creating customized content management systems. In other words, a programmers way to create (using the framework) a CMS that supports all the stages of content lifecycle: Organization - Workflow - Creation - Repository - Versioning - Publishing - Archives (or some subset thereof). In essence, a CM framework is a foundation on which a custom CMS can be built.

The key features I would expect to see in a CM framework include (http://www.cmsreview.com/Features/Lists.html):

a way to acquire both text and non-text content (acquisition, aggregation, authoring)
a way to store and retrieve content
a way to control workflow (roles/permissions, checkin/checkout, messaging/routing)
a way to control versioning
a way to control personalization and localization
an interface to administration (reporting, management, etc.)
a way to control content delivery (extraction, slicing, publishing, syndication, update)
a way to implement business rules
and others...

The trick is making this work as a framework. How much needs to be customized by a programmer and how much can be done by a very tech-savy non-programmer.

In our CM Framework (Tractare), we permit most of these features to be customized by scripting. For example, many organizations have IT departments that mandate the use of a particular DBMS (Oracle or SQL Server, for example). So in Tractare, there is a fairly simple configuration setting that allows selection of the database interface driver. But, since Tractare is a CM framework, we also expose the driver interface so that a programmer can create a driver for a database for which we have not supplied one. We make heavy use of configuration files so that non-programmers can customize the CMS.

Another area that we use scripting is in the interface elements (the "view" of the CMS). Internally, Tractare generates most interaction responses as an XML stream. This way, the entire user interface to Tractare can be tailored using XSLT scripts.

Depending on the application however, Tractare could still require an investment in programming. It is written in JAVA and intended to run as a web application (using servlets, etc.). So extending the code is a matter for a JAVA web programmer. But many features can be customized without this level of involvement.

Sunday, March 11, 2007

Open Publish Conference

I got to speak at the Open Publish conference in Baltimore (http://www.open-conferences.com/baltimore/) on Friday, 3/9/2007 amongst many of the content management and publishing luminaries. It was humbling. While the conference was small, it was very interactive and the audience was firing questions both during and after the talk. Kept me on my toes!

My particular talk was on topic maps, and on using topic maps as an information architecture to improve hypertext linking. The premise is that users frequently click on links embedded in web pages that take them somewhere they didn't want to go and that with better knowledge captured on the "aboutness" of the link, we (as content providers) can give the user more information and options about the links they are following (anyone interested can get the presentation slides from the conference directly or contact me and I'll email them).

My presentation was either fabulous or boring (depending on whether you're asking me or an attendee). But what I came away with were a bunch of questions about how to glean a topic set from extant content. I've spoken on the subject of topic maps on several occasions and many of the questions follow this same theme -- "How do we collect and organize the topics of the content in a meaningful way?"

I always have some lame answer -- "that's a job for the subject matter experts." While this is true -- the current state of the art is somewhat limited in our ability to parse content and determine subject matter, it really begs the question. Often (in my experience) there is something with which to begin this process. Usually, our content has already been touched in this way, from something as simple as the application of keywords to creation and maintenance of an outside topical index and the positioning of a particular content object within that. And often there are tables of content that can be used to further glean the "aboutness" of the content. What we (developers) need to consider is how to create a toolkit that both captures as much of this meta data as possible and imputes a relationship structure to it that in the least can serve as the foundation for a subject matter expert's work, and in the larger sense can provide a fully automated creation of a functional, integrated taxonomy.

At this conference, I was asked if I knew of any translation program that would convert the output of an index creation/management program (CINDEX) into a topic map -- a good example of what I'm talking about here.

I'm going to explore this further. We (Retrieval Systems) have a bunch of content processing tools, including some tools to work with the CINDEX data formats. Perhaps we can develop a reasonable toolkit for this kind of conversion.

Carton's Content Blog