Friday, March 28, 2008

Using Folksonomies in Content

I've been thinking more and more about folksonomies as a replacement for traditional taxonomies. We can create all the tools we need, or work off of existing tools such as But in the end, size does matter. For a folksonomy to work, a *lot* of people have to look at and tag the content. To me, this means content has to be exposed to readers, and lots of them, to tag the content.

Traditionally, the classification operation has been something done under control and part of back-room content management, by a small, select group of indexers. So first, we have to decide to relinquish some control. While this may seem scary, in reality the volume of tagging makes up for the lack of specific control. And, we can build the tagging tools so that editors and indexers can follow behind the tagging and clean it up. But if we can achieve a large volume of tagging, the volume and repetitive nature of the tagging will create common tags, threads and relationships.

Second, we need to find a large group of taggers. Depending on the nature of the content, this can be accomplished in a couple of ways. First, and perhaps easiest, is to expose content to the web. Perhaps through incentives, taggers can be enticed to tag. And, of course, staff of the publisher should be encouraged to participate as well. Failing that, a publisher could look at a human-automation engine such as Amazon's Mechanical Turk, where large numbers of minuscule tasks that are best done by people are spread out over many people for a small fee.

Thursday, March 27, 2008

Jobs (or Maybe Just Job)

Retrieval Systems Corporation (my employer), needs an experienced developer to serve as a project lead, closing out an ongoing project. The project uses SQL Server, IIS, Exchange/Outlook and VB.NET to track document filings, and enforcement activities through a legal process. We are currently wrapping up the initial data entry and enforcement status components of the application. The next phases will involve integration with Quickbooks and Sharepoint services.

The individual we are looking for will be a professional software developer, with a minimum of three years experience. They will need a solid knowledge of SQL, .net, and developing within the Microsoft web technologies. The job is for a project lead who will be expected to work on all aspects of the project life cycle.

Retrieval Systems Corporation is a developer of custom software. Our developers work in small teams and are involved in most aspects of a project. We are located at 2071 Chain Bridge Rd., Suite 510, Vienna, VA 22182. Work on this project, will be performed on site at our offices.

Thursday, March 20, 2008

Agile for Consultant Development Teams

I'm a consultant (sounds like an admission at the start of a 12-step program). I love the Agile approach to software development but there seems to be a fundamental conundrum I haven't been able to solve -- software project pricing. Seems like my clients want to know, in advance, how much the development of software is going to cost (the nerve of some people!). I just can't see how to get a price together for a project using an approach that essentially leaves out the up-front design effort that is characteristic of the traditional waterfall approach. And the Agile experts with whom I've spoken have no answer to this either.

Now, I don't actually believe that the waterfall approach gives any better or more realistic cost estimate, except perhaps on the most simplistic of projects. If anything, I believe it is a more expensive method that produces a false sense of cost security. But, I'm not getting into a comparison of these approaches. I'm just trying to figure out how to give a customer a price for an Agile project. And I've tried to say things like "what is your budget? We'll do as many of your priorities as possible within that." There is a disconnect between what clients want to know (how much for the whole thing) and the reality of what the "whole" thing needs to be and how much it will really cost.

One approach would be to simply lie. Tall the client what they want to hear. And then manage the consequences in the middle of the project. Doesn't seem like the right way though.

So instead, we've been trying to educate our clients on the tension between cost and features. In essence,
We've been using a variation on Wide-Band-Delphi-Blind to create estimates. We take the full list of features, attempt to designate tasks for implementing them, and then send that around to as many senior developers as we can -- developers with the right skill set to create an estimate for that type of project. Each estimator does his estimate alone and sends the results back to a coordinator and then we go through the estimates collectively to resolve the differences. Using this estimate of all features we then attempt to explain to our client that the estimate is under their control -- choose high priority features to implement first and then work through the list until either features or budget are exhausted.

This is not too different from agile poker ( As I understand that technique, several estimators each make a bid for how long each task or feature will take. The actual bids are not really hours. They are units. After several rounds of bidding and discussion on differences between bids, the winning bids are turned into dollars by applying the development team's velocity to the bid and figuring out real hours. This means that we have to know that velocity -- not an easy problem in its own right.

But we end up in the same place -- with an estimate for the whole project which we then have to "sell" to the client. Is there a better way?

Thursday, March 13, 2008

Bill Gates at the NVTC Breakfast Forum

I got the opportunity to attend the Northern Virginia Technology Council (NVTC) breakfast this morning, in DC at the Capitol Hilton. The speaker was Bill Gates. You know, that Microsoft guy? It was kind of cool to be a couple of hundred feet away from the richest man in the world (or maybe second now that Warren Buffet's star has been rising). Bill spoke at length on the direction computing is headed and mentioned everything from surface computing (your desk working for you?) to intention recognition through visualization. Very cool stuff, though I think I'll make sure I stay dressed in front of the PC-CAM.

And, I am fascinated by the concepts of surface interfaces -- as an active sailor, I'm envisioning a future sailboat that has a surface navigation computer that ties together mapping/charting, GPS, depth sounding (and underwater 3-D profiling), routing and guidance, with timely updates on chart data from the NOAA Notices to Mariners, coupled with a finger-driven interface similar to what Apple now uses on their iPods and iPhones (oh, wait, that's not from Microsoft).

But, I have to say, I was not "wow-ed" by Bill Gates' talk. I've heard him speak before and it's pretty much the same spiel I heard 20 years ago at a Microsoft CD-ROM Multimedia conference -- better human interaction with machines.

There were a couple of questions from the audience after the talk, and one questioner pounced on Microsoft for considering security as an afterthought (hear-hear). But my real disappointment is that Bill Gates' vision did not account for any of the real-world problems we seem to be facing: global warming, rising energy costs, environmental issues and so on. Where is computing really going to be in 20 years if, as Al Gore is suggesting, sea level has risen enough to flood some of our coastal cities? Or if we can no longer tolerate the environmental impact of hardware disposal?

So, I throw out a challenge: "What will computing look like in an age where we have higher sea levels, unaffordable energy, and intolerable environmental issues". Bill?

Wednesday, March 12, 2008

Folksonomies Applied

In a previous blog entry, I was starting to think about folksonomies as they might apply to content management. Having lots of experience in topical classification, therauri, and related tools, I figured this would be a simple discussion, but it isn't.

What is a Folksonomy? "Folksonomy (also known as collaborative tagging, social classification, social indexing, and social tagging) is the practice and method of collaboratively creating and managing tags to annotate and categorize content. In contrast to traditional subject indexing, metadata is not only generated by experts but also by creators and consumers of the content. Usually, freely chosen keywords are used instead of a controlled vocabulary" (Wikipedia).

By their nature, folksonomies are created by the people. I think the key to a successful folksonomy is participation by many people -- we make up for the lack of a controlled, standardized vocabulary and it's application to content by sheer volume and enthusiasm from a wide-spread using community. In fact, complaints about this approach usually center around the imprecise nature of the tagging. Since users typically apply tags to content, the tags are often ambiguous, overly personalised and inexact. But, Guy and Tonkin make a persuasive argument that user applied tags are in fact converging -- that the overall universe of applied tags is becoming self-limiting. If so, then the universe of tags that are being created in services like Flikr and are becoming useful bases for classifications.

And in other, related developments, these services are beginning to categorize their tags. Especially at where there are now classification tags and action tags, among others. To me these seem like we gray-beards call facets.

But that's not really my point, though I think it is important. I think we need some tools that can work with content management to allow tagging, maybe even super tagging, wherein the tags are members of controlled facets. This isn't really hard. Virtually every content management system "knows" about content by a URI. And there are some very cool features in the service, including keeping my bookmarks and tags private, and retrieving them via an API later. So, we can set a bookmark in containing the URI of the content we want to tag with the tags we want for that content.

I tried this very simply by registering at, turning on the "private bookmarks" setting, and putting the buttons on my browser toolbar. Then I pointed my browser at a content item in a CMS (Alfresco) and clicked the "Tag" button. Added tags and saved it. I can see the tags and the URIs in my items on the website.

But, to actually use this information, we need to pull the tags and content URIs back out of XML to the rescue. Or, rather, XML and the API. We can fetch the tags we're using by using this URL: We can also see all of our content using this URL: And we can retrieve by tag:

Pretty cool stuff.

Friday, March 7, 2008

Library Takes 'Talking Books' Digital, Washington Post, March 5, 2008

I love this article in the Washington Post. For a couple of reasons. First, I have several friends who are legally blind (which I think is different from "illegally blind"). These kinds of tools help them enormously. It makes me feel like, as a society, we're doing the right things at least sometimes.

But, and perhaps more importantly, I'm thrilled to see this because WE DEVELOPED IT. That's right folks, the excellent programming staff at Retrieval Systems (who pay me) wrote the underlying software for the digital talking book for the Library of Congress, National Library Service for the Blind and Physically Handicapped (NLS). So this makes us proud.

We've been doing work in the background for the blind community for many years. Mostly this has involved library data exchange. For both NLS and the Recording for the Blind and Dyslexic (RFB&D) we wrote bunches of code to help them exchange bibliographic records, supporting creation of a union catalog so that blind patrons could more readily find books. And for NLS we've also developed some production tools to help in the creation of digital talking books.

But the DTB player is something of which we're especially proud. Not only was it technically challenging, it was socially responsive.