web page hit counter

Sunday, March 29, 2009

Talis Connected Commons

I've got a Talis Platform developer instance that I've been using to host a tiny subset of the information from the National Register of Historic Places database. I spent some time this weekend at VoCampAustin trying to get the rest ontologized, and I was wondering what I was going to do when it came time to host the entire data set. Luckily for me Talis just announced the Talis Connected Commons:
...if you own, or are creating, a public domain dataset then you can store that data in the Platform as RDF, for free. We’re setting an initial cap of 50 million triples on each dataset, but thats should be plenty of space in which to collect some really interesting data.
While Amazon will "host" your public data set, that just means they provide some free disk space you can use from within your EC2 instance: you've still got to pay for bandwidth. With the Connected Commons, Talis takes care of those pesky bandwidth charges. I'm sure there are "within reason" disclaimers in there somewhere, but for anyone considering RDF-izing and publishing public data sets, the Talis offer is definitely worth checking out.

You should follow me on twitter here.

Wednesday, March 25, 2009

Facebook Redirects : The URL is not my son


If you didn't get here via a very specific Google search[1] you probably want to stop reading now, otherwise... Those who eschew the standard Facebook PHP libraries are rewarded with a fuller, richer understanding of the intricacies of the API. I.e. they have to deal with obscure crud that normal people get to ignore. This particular cruddy bit recently managed to eat more time than it had any right to consume:
The URL http://www.new.facebook.com/login.php?v=1.0&canvas&next=http%3A%2F%2Fapps.facebook.com
%2Fhistoryradar%2F&api_key=643f14fgg7294eff is not valid.
That error message was in response to a redirect to force authorization if there was no session information on a canvas page request. I knew that the URL was in fact valid since not only was it straight out of the Facebook API documentation, I could cut and paste it into a browser and get the correct response. At first I thought it was an encoding issue (the "next" value needs to be encoded since it's a URL inside a parameter) but it took digging into the standard PHP library to find the real problem:
public function redirect($url) {
if ($this->in_fb_canvas()) {

echo '<fb:redirect url="' . $url . '">';
} else if (preg_match('/^https?:\/\/([^\/]*\.)?facebook\.com(:\d+)?/i', $url)) {
// make sure facebook.com url's load in the full frame so that we don't
// get a frame within a frame.
echo "<script type="\"text/javascript\"">\ntop.location.href = \"$url\";\n";
} else {
header('Location: ' . $url);
}
exit;
}
Right. You don't need a "real" HTTP 302 redirect, you need to send a special snippet of FBML. It's easy to forget that Facebook is its own little Bizarro World until you step just a little outside the approved way of doing things.

[1] No, not "the url is not my son." I mean one with "login.php" and "url" and "not valid" in it somewhere.

You should follow me on twitter here.

Wednesday, March 11, 2009

Linked Data: End-User Applications?

Linked Data is a common-sense set of rules on how to use big-S Semantic Web technology to publish data on the Web. More or less[1]:
  • Publish your data as RDF[2].
  • Use HTTP URIs to name resources in your RDF.
  • Make the URIs dereferencable and contain more RDF.
  • Use a standard schema language (probably OWL[3]).
It's also probably safe to assume that the RDF used in Linked Data is "informal" and that there's a lot of it[4].

Linked Data has been gaining momentum lately as sort of a down-to-earth version of the Semantic Web. The fact that it's relatively low-effort[5] means that there is quite of bit of data being published, despite the fact that very few applications outside the laboratory make use of it.

Linked Data has obvious uses inside the enterprise. The BBC has done some great work using web development techniques based on a combination of hard-core enterprise integration and brand-spanking-new semantic web technology. Linked Data behind (and just in front of) the firewall is an exciting area, and I believe that is where it will find its first widespread commercial acceptance. But that's another blog post.

This post is about is real end-user applications that take advantage of Linked Data. By "real" I mean something that sits on your desktop (or in your phone, or maybe even web browser) with a rich user interface tailored for a particular task. I don't mean generic browsers or infrastructure components.

Since I had no clue what such an app would look like, I decided to take a generate-and-test approach: list the defining properties of Linked Data, come up with some consequences of those properties, then take the Cartesian product. Somewhere in there there must be a pony or two.

After some trial and error, I picked the following:
  • Is the application meant to handle more than a fixed set of data? For example, a browser like Marbles[5] is meant to navigate across all possible Linked Data, while a other systems limit themselves to, say, social network graphs as expressed by FOAF, or even a fixed set of in-house data sources.
  • Is the linking visible to the user? For example, in a generic browser the user sees the links and chooses which ones to navigate across. The links are central. On the other hand, a social network browser might automatically choose how to spider a FOAF network, and present the user with a summarized view containing data from many sources.
  • Is reasoning important? That is, is the raw data presented to the user, or will new triples be generated (or filtered) using formal (or informal) reasoning?
For reasoning, it's pretty much yes or no:
  • Synthesizes new triples? { Yes, No }
The same goes for linking: either the program makes the underlying low-level links visible and primary, or it covers them up somehow:
  • Navigation? { User-visible Links, Invisible Links }
The extensibility question turns out to be a little more complicated than a simple Yes/No. While there are clearly some programs that are totally generic and others that are totally fixed, there are interesting cases in the middle. I added another possibility: applications that try to do something with data they don't totally understand (maybe by understanding ontology fragments like parts of FOAF or geodata)
  • Extensible? { Yes, Somewhat, No }
There's an interesting tradeoff, where the more specific the knowledge an application has about the data it works with, the better crafted the user experience can be.

I am not happy with the list above. It leaves out some important characteristics, conflates others and is generally unsatisfactory. I originally had closer to a dozen characteristics, but the resulting combinatorial explosion made things awkward. But the list is just for inspiration, and so I'm willing to live with it.

And the results:



Linked Data applications like browsers and analysis tools show up with their own categories. That's not surprising since that's how the categories were chosen.

I especially like the look of the "hybrid" applications: the ones that combine hardcoded knowledge of the data with the ability to process new data discovered through following links. If there's a pony to be found, I suspect it's in one of those rows.

I'm currently working on an iPhone application called "National Register Radar" that uses geolocation and a Linked Data version of the U.S. National Register of Historic Places database to help users maintain "situational awareness" of the history of the places around them. Right now it would be a first-row application: it has hardcoded knowledge of specific kinds of Linked Data, it hides the low-level linking and provides a summary view, and it presents the data as it finds it, with no logical reasoning.

Although it's a relatively immature use of Linked Data, hardcoding makes developing the initial version of the application much easier. It means that few external libraries are required (an important consideration on a mobile device wiht an non-x86 processor and no Java). It also means that traditional application development techniques apply: I don't need to mess around with an on-phone triple store and SPARQL queries.

I think, though, that I'd like to turn it into a row 6 "enhanced special-purpose browser" that's not restricted to just a few hardcoded data sets. It's unclear how all the technology would fit together (how can the application make use of DBpedia data without hardcoding? Can Fresnel fit on a phone? Can Fresnel be adapted for voice output?) but it's worth a try (and probably a follow-up blog post).

Ultimately, I suspect Linked Data will be more at home deep inside enterprise infrastructure than at the end-user level, but I'm going to give National Register Radar another couple iteration and see where it ends up.

I'll be demoing an alpha version at BarCampAustin4, if you're interested in Linked Data (or speech synthesis) on the iPhone and are going to be in Austin, ping me and maybe we can get a BOF together...

[1] The "official" list is here: http://www.w3.org/DesignIssues/LinkedData.html, or even better, here: http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/

[2] RDF is the data model for the semantic web. Think of it as an ultra-hyper-normalized database where everything has been reduced to three columns: <subject> <predicate> <object>. Database keys are in the form or URIs. Don't let incredibly over-elaborate explanations fool you: it really is that simple.

[3] OWL is a popular schema language for RDF. Since reducing everything to triples removes pretty much all the type data, you need a schema language to add it back in. It's a little like a much more powerful version of XML Schema that lets you bake in the kind of semantics that would normally go into the comments. The documentation for it is uniformly awful.

[4] There's this really great article on just this topic that I can't seem to find a reference to right now. I'll edit it in later. Hey, it's a blog post, not a journal article, whadaya expect?

[5] It's relatively low effort because some pioneers have put in tons of effort developing some slick tools to help out.

[6] If it's up and running. Few of the commonly referenced Linked Data browsers worked reliably for me.

You should follow me on twitter here.