Progress, slow but steady

Having squashed my scoring bug, I have spent most of the weekend working on ensuring that parameters are being classified correctly. It’s taken hours of trial and error consuming most of the past two days, but it is finally producing consistent and correct results in classifying parameters. Now the task is to make improvements to the classification system. At present, I’m only getting a few percent improvements over a straight syntactical match. But much of that could be in the weighting of the syntax and synonym evaluations. Perhaps adding more classifiers will help, but I hope I can get an improvement by tweaking the comparisons I am already making.

One bug down

Success! At least, a limited form of success. I manged to find the cause of my parameter scoring bug. It was indeed simple and it seems the one fix solved many (hopefully all) of the symptoms. Even better, I think I also found the cause of the memory issue that has recently cropped up. I’m running the fixes now to make sure they work.

It also appears that my ontology builder is working properly. Once I build an ontology out of enough services I should know whether or not my scoring algorithm is effective. It’s working as designed, but whether that is what it needs to do is another matter entirely.

In the meantime, having squashed my bug, I think I have earned a cigar and a glass of Scotch. And I intend to collect my earnings immediately.

Bug hunting

What a pain. I’ve spent about 4 hours total today running a bug to ground and I came across a couple other major bugs in the process. It seems these problems always multiply instead of collapsing into a smaller number than they first appear. The good news is, it should be a relatively quick fix; it will just take some tedious stubby-pencil analysis to get it right this time.

But as I have to coordinate the decorating for my church’s Vacation Bible School tomorrow night, I guess my bug hunting may have to wait for a day or so. If I can get it fixed on Friday night, I may be generating ontologies by Saturday. I’d like to get a paper submitted for a conference whose deadline is August 9th. I still have a fighting chance of making it, although I wouldn’t bet the rent money on it.

Learning the Protege API

I spent a couple of hours this evening learning the basics of the Protege API. It’s pretty easy to read in an ontology and read out or create new parts of it. But now I have to figure out how to take specific evaluated service parameters and create new named classes and individuals from them in the ontology. If I can do that, I will have some real results to show. Well, maybe not “real” results, but certainly some intermediate results.

There are a few display issues I’ll have to work on, but they shouldn’t be an issue. The issue is digging through all of the documentation to figure out how to suppress (or eliminate) the excess text that gets appended to the class names. that may or may work; there are still gaps in the Protege documentation. I spent an hour finding out why the I was getting an error that OntologyLoadException couldn’t resolve. I made sure I was importing all the jar files the programmer’s guide said should be imported, but they neglected to mention that protege.jar from the main Protege directory needed to be imported as well. Perhaps the computer science types who are creating Protege at Stanford should team up with some folks from the humanities majors to check their documentation. Just a thought.

Sometimes progress is slow

It seems like I have done nothing all weekend. I have no additional code, no improvements to my algorithm, no drafts of papers to show for two days (I took Friday off — even students are allowed a little relaxation).

However, I did spend my time learning the new interfaces for Protege 4.1 because it supports OWL 2.0. Based on the assumption that my research should be founded on the latest standards, I assumed this was the way to go. It took a little while, but I learned how to use the new user interface to create an ontology.

And then I realized that the Protege 4.1 release, being a beta release, does not include the Protege OWL API. Given that I want to generate an ontology from a set of web service parameters, the OWL API is something of  a necessity.  Lesson learned about researching the features and limitations of beta software.

So I spent the remainder of the weekend going through the tutorial for Protege 3.4.4. It doesn’t support OWL 2, but it does include the API I wanted. So I reviewed the tutorial and built a basic ontology (in reality, more like a meta-ontology). The next task is to learn the API so I can work on generating an ontology from the service parameters I have evaluated.

Not the most productive weekend I have had, but definitely a learning experience. And it certainly could have been worse.

Making Decisions

Until now, I have had a simple system that uses several agents to parse a WSDL and drop its contents into a database, with some other agents that analyze the individual input and output parameters for each service and try to find matches for them among the parameters already in the database. Each match is loosely ranked based on how exact a match it is to any other parameter.

As of this evening, I am finally doing something with those scores. I added code that will take the set of matching parameters and their scores, apply a configurable weight to each type of match, and evaluate them to find the highest scoring match. That match is then stored in the database with the original parameter. This should give me some reasonable progress to show my advisor the next time we meet. It’s not much, but it is enough of a framework that I should be able to add an arbitrary number of evaluator agents and have all their scores accounted for and evaluated.

Now comes the tricky part: figuring out whether all these matches form any sort of reasonable ontology.

Where it stands as I start this blog

My doctoral research is focused in improving the user of web services and attempting to make it possible for end users (i.e., non-programmers) to modify the capabilities of their applications by adding new web services or composing existing web services into new workflows at run time. A major promise of web services has been that they will enable just that — users will be able to find new web services as they are published and add them to their applications without recompiling.

Previous efforts in this arena have focused on one of two main approaches: using purpose-built frameworks that channel all web service development to conform to specific standards that foster interoperability; or requiring developers to generate significant amounts of semantic metadata that is then appended to the web service description.

My approach is based on the belief that web service interface descriptions (generally in the form of Web Service Description Language (WSDL) documents) contain within them an implicit ontology, and that it is possible to tease out that ontology and match it to similar ontologies in other WSDL documents. Put another way, I believe that the structure and content of the WDSL interface description reflects the developer’s understanding of the information, and thereby the structure and relationship of the different input and output elements.

Parsing out this sort of information is traditionally the province of artificial intelligence researchers, but I am hopeful that by using a multitude of simple, task-specific agents, I can predict parameter matches with sufficient fidelity to enable me to state with some degree of accuracy that the outputs of Service A can be matched to the inputs of Service B, and that A and B can therefore be composed into a viable workflow.

I am using an agent-based framework (the Java Agent Development Framework) to develop a proof of concept application. So far, I am making slow but (hopefully) steady progress. I have a basic agent set that will parse a WSDL document into its operations, the input and output messages for each operation, and the individual parameters for each operation. All of these are stuck into a database for ease of access when I need them. I’ll fill in more details as I have time.

Shame is a powerful motivator

I seem to be unable to maintain a dissertation log*, which is a recommended practice for anyone working on their doctoral dissertation. At least, I can’t maintain one in the form of a paper document or a word processor file.

So I have resorted to maintaining one here, in public, where anyone and everyone can see whether I am maintaining it. Perhaps the potential shame of being called out for not updating it will be enough to keep me at it. I can only hope.

*In the process of proofreading I realized that the document is normally called a “dissertation journal,” but I guess old habits die hard. Those of us in the sea services maintain “logs” of our doings. On reflection, I like the sound of it better than “journal.”