Monday 2 November 2009

Spinning the Semantic Web.

Last week I attended the International Semantic Web (SW) Conference, ISWC 2009. The semantic web project is one that has interested me for a long time because it would be a large scale knowledge representation implementation and because it involves standardizing languages and approaches for doing so. But the semantic web has been taking its sweet time in catching on. Interestingly, I attended a few sessions of the ISWC in 2002 or 2003 and Tim Berners Lee claimed that we were just on the cusp of having it catch on and that it was picking up speed just the way the web originally did. I think that now, in 2009, some momentum is finally beginning to gather. dbpedia is a SW version of, essentially, Wikipedia and there is a way to query it using the SW query language, SPARQL, and the NY Times is "semantic webifying" itself, but the SW has not caught on at nearly the same speed as the web did and I think it is useful to ask why. Some of my thoughts:

a) There has been a tendency to make the semantic web a much harder problem than it needed to be. Last week's conference was full of discussions of generating the inferential closure of hundred of millions of triples (assertions), sophisticated model theory discussions and SPARQL extensions. A new OWL 2.0 spec was released that included n-ary quanitifiers. Those are important questions and issues for knowledge representation, but they're not, I would claim, the things to be focusing on when attempting to get the SW implemented on a wide scale. (A good but abstruse example: Last week I found myself in a discussion over a claim that a many-sorted first order logic implementation of uncertainty representation was preferable over a pure second-order because completeness and compactness were important features of a web reasoning language. Well, completeness and compactness are important features of a logic in the very purest sense of the word 'logic', i.e., in the sense of keeping logic contentless, but not really necessary for a knowledge representation language in such a heavily applied environment. Many argue that SOL is an appropriate foundation for arithmetic and set theory, surely the internet is not quite as pure as those domains.) RSS implemented simple RDF at one point, but even that proved too complex for full implementation, so why are people worried about packing inference into SPARQL and getting n-ary quantifiers into OWL? Any traction the SW is seeing is in FOAF and linked data, it hasn't been for want of n-ary quantifiers that the SW has been mostly unrealized. Linked data focuses on the relatively simple task of linking data and far less on sophisticated ontologies and knowledge representation issues. This gets to the heart of the reason why the SW has been slow to catch on. The utility of the web was obvious to people who didn't have computer science degrees; the SW, not so much.

b) Querying the semantic web is difficult. The standard query language for the SW is SPARQL, but from my experience, even relatively intelligent web searchers, doctors and the like, are barely capable of using quotes or boolean operators correctly, why do we think they'll be able to run complex SQL queries requiring complicated URL UIDs? SPARQL is useful for sophisticated users deeply familiar with the knowledge representation language and ontology that has been implemented, it would likely be much harder to use it for discovery, a key task in much web usage. And yet those involved in "spinning" the SW seem unwilling to give this problem much consideration.

c) The development has been very top down. See (a). The players in the SW are well known and the group is relatively small. We're getting standards passed down for problems that don't yet exist instead of going to the grass roots and trying to solve problems as they arise. Even the venue was evidence of this. The conference was ridiculously expensive and took place at some remote Marriott, completely inaccessible by public transit. Hardly screams "grass roots" or "user input". Tellingly, I heard lots of talk of the need to go out and "spread the word" and "encourage people to use it' or join "meet ups", etc. Or questions about how I get "people to take more interest in the semantic web". People will get interested when we show them it's useful, let's worry more about that and less about methods of popularization reminiscent of an evangelical church.

d) I'm still of the impression that the SW's original sin was to insist that the URL become the means of designating reference. I think it leads to ontological confusion. We use such strings both to point to pages about X and to refer to X itself, not completely unlike using some string to denote me and the apartment in which I happen to be living at some point in time. It's handy and solves what could have been a complicated UID problem but I wonder if it makes the proposed solution seem harder than it needs to be. There has been discussion of this issue amongst those doing the implementing and I wonder if the ontological fuzziness here ends up making the SW fuzzier than it needed to have been.

Anyway, I think the SW will catch on and is catching on, but I think it could have been happening much more quickly if people had mainly concerned themselves with making it useful and workable and less with exploiting it as a funding tool for interesting but ancillary AI problems.


No comments: