web page hit counter

Wednesday, March 11, 2009

Linked Data: End-User Applications?

Linked Data is a common-sense set of rules on how to use big-S Semantic Web technology to publish data on the Web. More or less[1]:
  • Publish your data as RDF[2].
  • Use HTTP URIs to name resources in your RDF.
  • Make the URIs dereferencable and contain more RDF.
  • Use a standard schema language (probably OWL[3]).
It's also probably safe to assume that the RDF used in Linked Data is "informal" and that there's a lot of it[4].

Linked Data has been gaining momentum lately as sort of a down-to-earth version of the Semantic Web. The fact that it's relatively low-effort[5] means that there is quite of bit of data being published, despite the fact that very few applications outside the laboratory make use of it.

Linked Data has obvious uses inside the enterprise. The BBC has done some great work using web development techniques based on a combination of hard-core enterprise integration and brand-spanking-new semantic web technology. Linked Data behind (and just in front of) the firewall is an exciting area, and I believe that is where it will find its first widespread commercial acceptance. But that's another blog post.

This post is about is real end-user applications that take advantage of Linked Data. By "real" I mean something that sits on your desktop (or in your phone, or maybe even web browser) with a rich user interface tailored for a particular task. I don't mean generic browsers or infrastructure components.

Since I had no clue what such an app would look like, I decided to take a generate-and-test approach: list the defining properties of Linked Data, come up with some consequences of those properties, then take the Cartesian product. Somewhere in there there must be a pony or two.

After some trial and error, I picked the following:
  • Is the application meant to handle more than a fixed set of data? For example, a browser like Marbles[5] is meant to navigate across all possible Linked Data, while a other systems limit themselves to, say, social network graphs as expressed by FOAF, or even a fixed set of in-house data sources.
  • Is the linking visible to the user? For example, in a generic browser the user sees the links and chooses which ones to navigate across. The links are central. On the other hand, a social network browser might automatically choose how to spider a FOAF network, and present the user with a summarized view containing data from many sources.
  • Is reasoning important? That is, is the raw data presented to the user, or will new triples be generated (or filtered) using formal (or informal) reasoning?
For reasoning, it's pretty much yes or no:
  • Synthesizes new triples? { Yes, No }
The same goes for linking: either the program makes the underlying low-level links visible and primary, or it covers them up somehow:
  • Navigation? { User-visible Links, Invisible Links }
The extensibility question turns out to be a little more complicated than a simple Yes/No. While there are clearly some programs that are totally generic and others that are totally fixed, there are interesting cases in the middle. I added another possibility: applications that try to do something with data they don't totally understand (maybe by understanding ontology fragments like parts of FOAF or geodata)
  • Extensible? { Yes, Somewhat, No }
There's an interesting tradeoff, where the more specific the knowledge an application has about the data it works with, the better crafted the user experience can be.

I am not happy with the list above. It leaves out some important characteristics, conflates others and is generally unsatisfactory. I originally had closer to a dozen characteristics, but the resulting combinatorial explosion made things awkward. But the list is just for inspiration, and so I'm willing to live with it.

And the results:

Linked Data applications like browsers and analysis tools show up with their own categories. That's not surprising since that's how the categories were chosen.

I especially like the look of the "hybrid" applications: the ones that combine hardcoded knowledge of the data with the ability to process new data discovered through following links. If there's a pony to be found, I suspect it's in one of those rows.

I'm currently working on an iPhone application called "National Register Radar" that uses geolocation and a Linked Data version of the U.S. National Register of Historic Places database to help users maintain "situational awareness" of the history of the places around them. Right now it would be a first-row application: it has hardcoded knowledge of specific kinds of Linked Data, it hides the low-level linking and provides a summary view, and it presents the data as it finds it, with no logical reasoning.

Although it's a relatively immature use of Linked Data, hardcoding makes developing the initial version of the application much easier. It means that few external libraries are required (an important consideration on a mobile device wiht an non-x86 processor and no Java). It also means that traditional application development techniques apply: I don't need to mess around with an on-phone triple store and SPARQL queries.

I think, though, that I'd like to turn it into a row 6 "enhanced special-purpose browser" that's not restricted to just a few hardcoded data sets. It's unclear how all the technology would fit together (how can the application make use of DBpedia data without hardcoding? Can Fresnel fit on a phone? Can Fresnel be adapted for voice output?) but it's worth a try (and probably a follow-up blog post).

Ultimately, I suspect Linked Data will be more at home deep inside enterprise infrastructure than at the end-user level, but I'm going to give National Register Radar another couple iteration and see where it ends up.

I'll be demoing an alpha version at BarCampAustin4, if you're interested in Linked Data (or speech synthesis) on the iPhone and are going to be in Austin, ping me and maybe we can get a BOF together...

[1] The "official" list is here: http://www.w3.org/DesignIssues/LinkedData.html, or even better, here: http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/

[2] RDF is the data model for the semantic web. Think of it as an ultra-hyper-normalized database where everything has been reduced to three columns: <subject> <predicate> <object>. Database keys are in the form or URIs. Don't let incredibly over-elaborate explanations fool you: it really is that simple.

[3] OWL is a popular schema language for RDF. Since reducing everything to triples removes pretty much all the type data, you need a schema language to add it back in. It's a little like a much more powerful version of XML Schema that lets you bake in the kind of semantics that would normally go into the comments. The documentation for it is uniformly awful.

[4] There's this really great article on just this topic that I can't seem to find a reference to right now. I'll edit it in later. Hey, it's a blog post, not a journal article, whadaya expect?

[5] It's relatively low effort because some pioneers have put in tons of effort developing some slick tools to help out.

[6] If it's up and running. Few of the commonly referenced Linked Data browsers worked reliably for me.

You should follow me on twitter here.


Blogger mhausenblas said...

Great article - we started to collect here as well [1].


[1] http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData/Applications

8:40 AM  
Blogger Fran Sansalone said...

Christopher -
Please check out Thomson Reuters' work at www.opencalais.com - we're working on a lot of initiatives that might be of interest.
Fran Sansalone
Calais Community Manager

9:06 AM  
Blogger Zemantic dreams said...


very interesting thinking, but maybe you emphasis too much on exposing the semantic web stack.

Have you heard of Zemanta (the blogging helper)?

I am CTO here and what we do is we leverage semantic web to provide new experience while writing articles.

While semantic web is leveraged, we don't care about triples or similar, we care about delivering value.

Try it out (the only thing we do with triples and is visible to end user is semantic tagging markup which you can turn on in preferences).

I'd be interested where you would put Zemanta in your classification?

(we naturally have an API and return RDF/XML, but let's first talk about end-user product)

Andraz Tori, Zemanta

9:35 AM  
Anonymous viagra online said...

Do you have more info about these user applications?

12:20 PM  

Post a Comment

<< Home