web page hit counter

Friday, July 10, 2009

303 Madness and the Giant Global Graph

I had the opportunity to do a short talk at the latest Semantic Web Dallas meetup. I decided on an overview of the 303-redirect dance that differentiates a URI that points to a web page from a URI that names a concept in the Semantic Web. Yes, there's a difference. Yes, it's an important difference. Probably. In any case, it's a good topic for a 10-minute talk because having to listen to stuff like this for more than ten minutes at a time can lead to bleeding from the ears. It's a complex issue with an, uhh, unexpected? solution, best approached with a sense of humor. Well, maybe not best approached that way, but it seemed like a good idea at the time. And the list of references at the end is pretty good.



If you read the references you'll learn that you can also use URLs with fragment identifiers in your RDF. But doing it that way doesn't involve a fundamental redefinition of part of HTTP so it's a lot less entertaining.

You should follow me on twitter here.

Thursday, July 02, 2009

Popstat on Google App Engine

Popstat is the demo application from my Facebook Dev Garage Dallas presentation. It just posts a status message to Facebook and Twitter to demonstrate using both Facebook Connect and an external service. I developed it on my laptop and didn't have time to move it to a public host before the event. I wanted it out there live someplace and figured it was a good opportunity to try out Google App Engine's Java support (Popstat uses Grails with a mix of Groovy and Java)

I got it all working, but it was a pain.
  • I used the Grails AppEngine plugin. I liked it.
  • App Engine provides storage, but not in the form of a relational database. It's close enough that JPA and JDO both work (but not Hibernate, yet). I chose JPA, but either way you'll need to annotate your domain classes (I expected the GORM-JPA plugin to do that for me, but it didn't)
  • You'll need to put your domain classes into named packages. Things (silently) don't go well if you leave them in the default package.
  • If you're using JPA, domain classes will need to explicitly declare an id field. Make it a Long, and add the @Id and @GeneratedValue annotations. Use GenerationType.IDENTITY.
  • I was able to use the dynamic save() method provided by GORM-JPA, but I had to wrap up the calls in a withTransaction block, and the semantics are slightly different (use merge() instead of save() for updates)
  • Depending on your version of Spring, you may get a message along the lines of "org.springframework. context.annotation. internalPersistenceAnnotationProcessor': Initialization of bean failed" with something about "java.lang.NoClassDefFoundError: javax/naming/NamingException". The fix here worked for me.
  • Popstat uses the facebook-java-api library. Since App Engine forbids the use of JAXB, I had to switch to the JSON version of the client to avoid an error about JAXBContext.
  • To talk to Twitter, Popstat uses the oauth-signpost library. But Signpost depends on Apache HttpClient, and HttpClient uses low-level Socket calls forbidden by App Engine. I hacked Signpost to use URLConnection, but I wouldn't recommend that approach. If I had to do it again, I'd look around for an OAuth library that worked out of the box.
  • By default, the App Engine Java Development Server (a version of the App Engine environment you can run on your local machine) binds to localhost only. The command line client has a "--address" option, but the "grails app-engine run" command doesn't. I hacked the scripts/AppEngine.groovy plugin and harcoded the address parameter into startDevServer().
There was some other stuff that I didn't take notes on, but (other than registration being turned off) Popstat is doing what it did before.

Overall, though, it wasn't a great experience. Google turns off random bits of Java (for security and ease of management), which means that very few third-party libraries are going to work. You'll probably have to do some porting of your own code as well. That, combined with the admin service being down all morning, left a bad taste. The free hosting thing is great for demo apps but I think I'll stick to something like Amazon EC2 for real work. I'm very curious to see how Microsoft Azure stacks up (it's much more of a direct competitor to App Engine than the roll-it-all-yourself EC2)

You should follow me on twitter here.

Tuesday, June 30, 2009

The Semantic Web or The Generic at War with the Specific

It's easy to imagine an application that takes advantage of Linked Data by extracting just what it needs and dumping it into a local relational database. But that's clearly cheating. It's equally easy to imagine a completely generic low-level Linked Data browser, but there's something less than completely satisfying about that, too. The basic problem is that a rich user experience requires specifics, while taking full advantage of the "anyone can say anything about anything" nature of the semantic web means that applications must be able to handle almost totally generic data[1]. At least that was the theme of my presentation to the Dallas chapter of the IxDA earlier tonight...



I'm especially proud of the way I failed to force people to sit through a detailed explanation of graph structures, subject-predicate-object triples, the use of URIs as identifiers, or any of the other traditional cruft that obscures the capabilities of semantic web technology under a morass of unnecessary detail. (Imagine introducing relational databases by first forcing people to understand index paging mechanisms, or learning to cook via an explanation of organic chemistry). The audience seemed to appreciate it.

[1] I struggled with this earlier over in /2009/03/linked-data-end-user-applications.html

[2] The translation from Keynote to Powerpoint to Google docs was not without problems. And you will definitely need to click through and get a larger version to read some of the screens.

You should follow me on twitter here.

Sunday, June 28, 2009

Facebook, Intrigue, Betrayal, Murder

A working understanding of authentication and authorization protocols is key to making use of modern web APIs. But protocols like the three-party delegated authc/authz[1] typical of modern web services can be difficult to follow. Role-playing protocol participants[2] is a fun way to make a very abstract process concrete, so I decided to write, produce and direct some geek theater at my recent Facebook Developer Garage Dallas presentation. When you get to the script pages, imagine Alice played by about the least feminine guy you've ever seen and you'll have the right atmosphere (you might need to click through and view the presentation full-sized to read the text on some pages)



I finished up with a quick review of some very traditional distributed programming topics. The questions "just how many test cases would you need to cover the possible states your program can be in?" and "what makes you think you can test these modules independently?" get people thinking along the right lines.

Oh. In the end Alice runs off with Bob and all of Dave's money, leaving him on the hook with the Mafia for four guns and several bribes. Such is life in the high-stakes world of distributed programming.


[1] Authc = authentication, or identifying a user, and authz = authorization, or determining what services a user is allowed to make use of once they're identified. Authentication says who you are, authorization says what you can do. In the presentation I talk specifically about delegated authc/authz, and ignore the more traditional single-process examples. People seem surprised to learn that OAuth, which is an authorization protocol, doesn't necessarily tell your application the userid of the user (although many implementations include the info along with the authorization tokens that are the primary purpose of the protocol) It doesn't help that the OAuth spec confuses the two.

[2] So, admittedly, the examples aren't usually acted out in front of an audience, but the role-playing does have a long and honored history. The script actually simplifies the real protocol considerably, but it should give the correct flavor: http://www.networkworld.com/news/2005/020705widernetaliceandbob.html

You should follow me on twitter here.

Sunday, March 29, 2009

Talis Connected Commons

I've got a Talis Platform developer instance that I've been using to host a tiny subset of the information from the National Register of Historic Places database. I spent some time this weekend at VoCampAustin trying to get the rest ontologized, and I was wondering what I was going to do when it came time to host the entire data set. Luckily for me Talis just announced the Talis Connected Commons:
...if you own, or are creating, a public domain dataset then you can store that data in the Platform as RDF, for free. We’re setting an initial cap of 50 million triples on each dataset, but thats should be plenty of space in which to collect some really interesting data.
While Amazon will "host" your public data set, that just means they provide some free disk space you can use from within your EC2 instance: you've still got to pay for bandwidth. With the Connected Commons, Talis takes care of those pesky bandwidth charges. I'm sure there are "within reason" disclaimers in there somewhere, but for anyone considering RDF-izing and publishing public data sets, the Talis offer is definitely worth checking out.

You should follow me on twitter here.

Wednesday, March 25, 2009

Facebook Redirects : The URL is not my son


If you didn't get here via a very specific Google search[1] you probably want to stop reading now, otherwise... Those who eschew the standard Facebook PHP libraries are rewarded with a fuller, richer understanding of the intricacies of the API. I.e. they have to deal with obscure crud that normal people get to ignore. This particular cruddy bit recently managed to eat more time than it had any right to consume:
The URL http://www.new.facebook.com/login.php?v=1.0&canvas&next=http%3A%2F%2Fapps.facebook.com
%2Fhistoryradar%2F&api_key=643f14fgg7294eff is not valid.
That error message was in response to a redirect to force authorization if there was no session information on a canvas page request. I knew that the URL was in fact valid since not only was it straight out of the Facebook API documentation, I could cut and paste it into a browser and get the correct response. At first I thought it was an encoding issue (the "next" value needs to be encoded since it's a URL inside a parameter) but it took digging into the standard PHP library to find the real problem:
public function redirect($url) {
if ($this->in_fb_canvas()) {

echo '<fb:redirect url="' . $url . '">';
} else if (preg_match('/^https?:\/\/([^\/]*\.)?facebook\.com(:\d+)?/i', $url)) {
// make sure facebook.com url's load in the full frame so that we don't
// get a frame within a frame.
echo "<script type="\"text/javascript\"">\ntop.location.href = \"$url\";\n";
} else {
header('Location: ' . $url);
}
exit;
}
Right. You don't need a "real" HTTP 302 redirect, you need to send a special snippet of FBML. It's easy to forget that Facebook is its own little Bizarro World until you step just a little outside the approved way of doing things.

[1] No, not "the url is not my son." I mean one with "login.php" and "url" and "not valid" in it somewhere.

You should follow me on twitter here.

Wednesday, March 11, 2009

Linked Data: End-User Applications?

Linked Data is a common-sense set of rules on how to use big-S Semantic Web technology to publish data on the Web. More or less[1]:
  • Publish your data as RDF[2].
  • Use HTTP URIs to name resources in your RDF.
  • Make the URIs dereferencable and contain more RDF.
  • Use a standard schema language (probably OWL[3]).
It's also probably safe to assume that the RDF used in Linked Data is "informal" and that there's a lot of it[4].

Linked Data has been gaining momentum lately as sort of a down-to-earth version of the Semantic Web. The fact that it's relatively low-effort[5] means that there is quite of bit of data being published, despite the fact that very few applications outside the laboratory make use of it.

Linked Data has obvious uses inside the enterprise. The BBC has done some great work using web development techniques based on a combination of hard-core enterprise integration and brand-spanking-new semantic web technology. Linked Data behind (and just in front of) the firewall is an exciting area, and I believe that is where it will find its first widespread commercial acceptance. But that's another blog post.

This post is about is real end-user applications that take advantage of Linked Data. By "real" I mean something that sits on your desktop (or in your phone, or maybe even web browser) with a rich user interface tailored for a particular task. I don't mean generic browsers or infrastructure components.

Since I had no clue what such an app would look like, I decided to take a generate-and-test approach: list the defining properties of Linked Data, come up with some consequences of those properties, then take the Cartesian product. Somewhere in there there must be a pony or two.

After some trial and error, I picked the following:
  • Is the application meant to handle more than a fixed set of data? For example, a browser like Marbles[5] is meant to navigate across all possible Linked Data, while a other systems limit themselves to, say, social network graphs as expressed by FOAF, or even a fixed set of in-house data sources.
  • Is the linking visible to the user? For example, in a generic browser the user sees the links and chooses which ones to navigate across. The links are central. On the other hand, a social network browser might automatically choose how to spider a FOAF network, and present the user with a summarized view containing data from many sources.
  • Is reasoning important? That is, is the raw data presented to the user, or will new triples be generated (or filtered) using formal (or informal) reasoning?
For reasoning, it's pretty much yes or no:
  • Synthesizes new triples? { Yes, No }
The same goes for linking: either the program makes the underlying low-level links visible and primary, or it covers them up somehow:
  • Navigation? { User-visible Links, Invisible Links }
The extensibility question turns out to be a little more complicated than a simple Yes/No. While there are clearly some programs that are totally generic and others that are totally fixed, there are interesting cases in the middle. I added another possibility: applications that try to do something with data they don't totally understand (maybe by understanding ontology fragments like parts of FOAF or geodata)
  • Extensible? { Yes, Somewhat, No }
There's an interesting tradeoff, where the more specific the knowledge an application has about the data it works with, the better crafted the user experience can be.

I am not happy with the list above. It leaves out some important characteristics, conflates others and is generally unsatisfactory. I originally had closer to a dozen characteristics, but the resulting combinatorial explosion made things awkward. But the list is just for inspiration, and so I'm willing to live with it.

And the results:



Linked Data applications like browsers and analysis tools show up with their own categories. That's not surprising since that's how the categories were chosen.

I especially like the look of the "hybrid" applications: the ones that combine hardcoded knowledge of the data with the ability to process new data discovered through following links. If there's a pony to be found, I suspect it's in one of those rows.

I'm currently working on an iPhone application called "National Register Radar" that uses geolocation and a Linked Data version of the U.S. National Register of Historic Places database to help users maintain "situational awareness" of the history of the places around them. Right now it would be a first-row application: it has hardcoded knowledge of specific kinds of Linked Data, it hides the low-level linking and provides a summary view, and it presents the data as it finds it, with no logical reasoning.

Although it's a relatively immature use of Linked Data, hardcoding makes developing the initial version of the application much easier. It means that few external libraries are required (an important consideration on a mobile device wiht an non-x86 processor and no Java). It also means that traditional application development techniques apply: I don't need to mess around with an on-phone triple store and SPARQL queries.

I think, though, that I'd like to turn it into a row 6 "enhanced special-purpose browser" that's not restricted to just a few hardcoded data sets. It's unclear how all the technology would fit together (how can the application make use of DBpedia data without hardcoding? Can Fresnel fit on a phone? Can Fresnel be adapted for voice output?) but it's worth a try (and probably a follow-up blog post).

Ultimately, I suspect Linked Data will be more at home deep inside enterprise infrastructure than at the end-user level, but I'm going to give National Register Radar another couple iteration and see where it ends up.

I'll be demoing an alpha version at BarCampAustin4, if you're interested in Linked Data (or speech synthesis) on the iPhone and are going to be in Austin, ping me and maybe we can get a BOF together...

[1] The "official" list is here: http://www.w3.org/DesignIssues/LinkedData.html, or even better, here: http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/

[2] RDF is the data model for the semantic web. Think of it as an ultra-hyper-normalized database where everything has been reduced to three columns: <subject> <predicate> <object>. Database keys are in the form or URIs. Don't let incredibly over-elaborate explanations fool you: it really is that simple.

[3] OWL is a popular schema language for RDF. Since reducing everything to triples removes pretty much all the type data, you need a schema language to add it back in. It's a little like a much more powerful version of XML Schema that lets you bake in the kind of semantics that would normally go into the comments. The documentation for it is uniformly awful.

[4] There's this really great article on just this topic that I can't seem to find a reference to right now. I'll edit it in later. Hey, it's a blog post, not a journal article, whadaya expect?

[5] It's relatively low effort because some pioneers have put in tons of effort developing some slick tools to help out.

[6] If it's up and running. Few of the commonly referenced Linked Data browsers worked reliably for me.

You should follow me on twitter here.