web page hit counter

Thursday, April 07, 2011

Three Books

Too long for Twitter: Three books the people who actually build high tech products should read:

Crossing the Chasm by Geoffrey Moore. Just read the whole thing cover-to-cover. Sure, it talks about ways to move from early to mainstream markets, but it also explains just what a market is, the difference between a cool technology and a "Whole Product" that's useful enough for somebody to actually pay for, and how to care for, feed and eventualy abandon early adopters. Moore didn't invent all (or even most of) the ideas in the book, but he took a bunch of esoteric information and turned it into a coherent, easy-to-read strategy for taking technology products to market.

The Four Steps to the Epiphany by Steven Gary Blank. Unlike Crossing, this one's a bit of a slog, but if you're selling to businesses rather than consumers resist the temptation to skim it or read one of the "lite" versions floating around. Blank takes a bunch of industry best practices, mixes in his own hard-won wisdom, and produces a plan for turning an idea into a workable business. Spend a day or two reading through Crossing to get some background, then spend as long as it takes to get through Four Steps. It's worth it.

Solution Selling by Michael T. Bosworth. There's "New Solution Selling" by Eades and "Spin Selling" by Rackham, but the takeaway should be the same no matter which you choose: selling is not a black art that only born salesmen can understand. Selling is a rational processing of matching customer needs to your solutions, and you (yes you, geek engineer) are fully capable of doing it. The key is that we're not talking hucksters selling vacuum cleaners door-to-door, we're talking selling expensive, complex, high-tech solutions in an ongoing relationship. And that means actually solving your customer's problem is important. And that firmly aligns the selling with the engineering. A good engineer is still going to need to put in some work to become a competent high-tech salesperson, but engineers have a huge advantage over non-engineers in this arena.

Four Steps has a great appendix with many more recommendations, but those three are the must-reads I'd want anyone I was working with (CEO to engineering intern) to have read and understood.

You should follow me on twitter here.

Saturday, March 05, 2011

Dallas Big Data Micro Hackathon

Tape library, CERN, Geneva 2
Tuesday, March 22nd at Cohabitat in Dallas, Texas a group of geeks is getting together to write code to process Big Data. There will likely be a group getting Hadoop up and running on laptops ("Hello, World" level), and possibly a group working with Cassandra and MapReduce on EC2 ("I actually have a real-life problem to solve" level) but anyone who's serious about Big Data is invited. You can sign up at:
"Big Data" isn't a formally defined term, but the general idea is that if your data set could fit on a single disk or be easily hosted inside a normal relational database it's not big data. Google calculating PageRank for every site they index? LinkedIn doing a complete social graph analysis for every user? Those are definitely Big Data problems. Technologies like MapReduce (see Hadoop) and certain kinds of highly scalable NoSQL databases (see Cassandra) fall under the heading, but it's a big tent and there are many other possibilities.
The event is free, but the idea is to keep the group small and focused, so spaces are limited. Sign up at: http://dbdmh.eventbrite.com

Labels: , , , ,

You should follow me on twitter here.

Tuesday, May 04, 2010

Frustrated with Dallas

Union Square Ventures, the New York-based venture capital firm famous for its Web 2.0 investments (Foursquare, Meetup, tumblr, Twitter, delicious, Feedburner, etc) recently announced that they're hiring an investment analyst and a general manager. They invited anybody who was interested to send them links to their web presence, and the staff at USV would use that instead of a resume. Kinda cool.

In a followup post, they mapped the applicants. Nobody from Dallas applied for the general manager position, and only a couple could stir themselves to apply for the analyst spot.

Anybody who wanted to could apply. No old-boy network, no experience absolutely required (although they did have some "ideal candidate" desiderata). And Dallas just couldn't be bothered.

I'm very frustrated with Dallas.

You should follow me on twitter here.

Sunday, February 14, 2010

Linked Data, Confidence Games and the Transitivity of Trust

Over the Christmas holidays I took my family on a five thousand mile roadtrip around the American West. It took a couple of weeks and I expected to spend a lot of time on my favorite user-generated travel review site.

And I did spend a lot of time on the site, enough to eventually figure out that it had been comprehensively infiltrated by review spammers. Some of the spam reviews were obvious: "I loved this place! Five stars!" when all the rest of the reviews were negative. Some were more devious: "There were bedbugs! They spat in my soup! Zero stars!" when all the other reviews were stellar. In other cases it was much harder to tell, and in all cases the average rating was highly suspect.

Turns out there are companies that specialize in vandalizing review sites[1]. The companies employ actual humans who spend actual creative effort to craft misleading reviews. They even set up realistic user profiles, and on some sites they add each other as friends. In other words, it's considered worthwhile to spend real time and effort on this stuff.

It's been suggested that there's a technological solution: If the reviewers are part of a social network, it's possible to extract some useful statistics that might help determine if the user is real or fake.

If the reviewer is a friend, that's obviously useful information. But there's very little chance that some random reviewer is your friend.

But what if the reviewer is part of your extended social network? Surely the fact that somebody is a friend of a friend is some indication that they're trustworthy, or at least that they're a real person.

Nope.

First off, with a fan-out of 200 friends the 2nd-level extended social graph is around 40,000 people. Allowing for annoying people who friend everybody, an extended social graph could easily include a substantial portion of the entire population of the planet. All it takes is a couple of mistaken friend-adds to get you hooked up to a spammer-created sub-network. Even if you're careful, it's overwhelmingly likely that some friend-of-a-friend isn't.

So, trust is clearly not transitive and the idea of a "web of trust" cannot be taken literally[2].

In most cases, it's only possible to determine if the "shape" of the reviewer's social graph is reasonable. That is, are they friends with other plausible-looking people? Are many of their friends known fake profiles? Do they have a realistic number of friends? Etc.

But that's trivial to game. Even if there are obstacles to a totally automated approach, the application of ultra-cheap human labor makes it easy to set up a fake social network on any given site.

Linked data and distributed social graphs (ala FOAF + SSL) make things worse, because while before it took some amount of human effort to solve the captchas and create new accounts on a social-graph silo like Facebook, with a distributed "web of trust" approach it can all be completely automated.

That isn't to say FOAF + SSL isn't a neat replacement for the monstrosity that OpenID has become, but the "web of trust" part won't fly.

That said, in some sense it doesn't really matter. I'm certainly not arguing that we should slow in our rush towards a semantic web. The benefits are too great. But given the experience with email spammers and review fraudsters, it might be a good idea to be open about the fact that we're also introducing new hazards.

[1] So, honestly, I only have anecdotal evidence. But it doesn't seem like a very controversial assumption.

[2] "Trust" is a complicated word. It's not that knowing a review is by a friend-of-a-friend-of-a-friend isn't useful information, it's that using it to make a binary yes/no trust decision is misguided. There's been some interesting academic research in this area, Wikipedia has a rundown: http://en.wikipedia.org/wiki/Web_of_trust In what seems like a perfectly sensible approach, this paper: http://www.mindswap.org/papers/Trust.pdf suggests using social graph information as just one input into a full spam handling system.

You should follow me on twitter here.

Friday, July 10, 2009

303 Madness and the Giant Global Graph

I had the opportunity to do a short talk at the latest Semantic Web Dallas meetup. I decided on an overview of the 303-redirect dance that differentiates a URI that points to a web page from a URI that names a concept in the Semantic Web. Yes, there's a difference. Yes, it's an important difference. Probably. In any case, it's a good topic for a 10-minute talk because having to listen to stuff like this for more than ten minutes at a time can lead to bleeding from the ears. It's a complex issue with an, uhh, unexpected? solution, best approached with a sense of humor. Well, maybe not best approached that way, but it seemed like a good idea at the time. And the list of references at the end is pretty good.



If you read the references you'll learn that you can also use URLs with fragment identifiers in your RDF. But doing it that way doesn't involve a fundamental redefinition of part of HTTP so it's a lot less entertaining.

You should follow me on twitter here.

Thursday, July 02, 2009

Popstat on Google App Engine

Popstat is the demo application from my Facebook Dev Garage Dallas presentation. It just posts a status message to Facebook and Twitter to demonstrate using both Facebook Connect and an external service. I developed it on my laptop and didn't have time to move it to a public host before the event. I wanted it out there live someplace and figured it was a good opportunity to try out Google App Engine's Java support (Popstat uses Grails with a mix of Groovy and Java)

I got it all working, but it was a pain.
  • I used the Grails AppEngine plugin. I liked it.
  • App Engine provides storage, but not in the form of a relational database. It's close enough that JPA and JDO both work (but not Hibernate, yet). I chose JPA, but either way you'll need to annotate your domain classes (I expected the GORM-JPA plugin to do that for me, but it didn't)
  • You'll need to put your domain classes into named packages. Things (silently) don't go well if you leave them in the default package.
  • If you're using JPA, domain classes will need to explicitly declare an id field. Make it a Long, and add the @Id and @GeneratedValue annotations. Use GenerationType.IDENTITY.
  • I was able to use the dynamic save() method provided by GORM-JPA, but I had to wrap up the calls in a withTransaction block, and the semantics are slightly different (use merge() instead of save() for updates)
  • Depending on your version of Spring, you may get a message along the lines of "org.springframework. context.annotation. internalPersistenceAnnotationProcessor': Initialization of bean failed" with something about "java.lang.NoClassDefFoundError: javax/naming/NamingException". The fix here worked for me.
  • Popstat uses the facebook-java-api library. Since App Engine forbids the use of JAXB, I had to switch to the JSON version of the client to avoid an error about JAXBContext.
  • To talk to Twitter, Popstat uses the oauth-signpost library. But Signpost depends on Apache HttpClient, and HttpClient uses low-level Socket calls forbidden by App Engine. I hacked Signpost to use URLConnection, but I wouldn't recommend that approach. If I had to do it again, I'd look around for an OAuth library that worked out of the box.
  • By default, the App Engine Java Development Server (a version of the App Engine environment you can run on your local machine) binds to localhost only. The command line client has a "--address" option, but the "grails app-engine run" command doesn't. I hacked the scripts/AppEngine.groovy plugin and harcoded the address parameter into startDevServer().
There was some other stuff that I didn't take notes on, but (other than registration being turned off) Popstat is doing what it did before.

Overall, though, it wasn't a great experience. Google turns off random bits of Java (for security and ease of management), which means that very few third-party libraries are going to work. You'll probably have to do some porting of your own code as well. That, combined with the admin service being down all morning, left a bad taste. The free hosting thing is great for demo apps but I think I'll stick to something like Amazon EC2 for real work. I'm very curious to see how Microsoft Azure stacks up (it's much more of a direct competitor to App Engine than the roll-it-all-yourself EC2)

You should follow me on twitter here.

Tuesday, June 30, 2009

The Semantic Web or The Generic at War with the Specific

It's easy to imagine an application that takes advantage of Linked Data by extracting just what it needs and dumping it into a local relational database. But that's clearly cheating. It's equally easy to imagine a completely generic low-level Linked Data browser, but there's something less than completely satisfying about that, too. The basic problem is that a rich user experience requires specifics, while taking full advantage of the "anyone can say anything about anything" nature of the semantic web means that applications must be able to handle almost totally generic data[1]. At least that was the theme of my presentation to the Dallas chapter of the IxDA earlier tonight...



I'm especially proud of the way I failed to force people to sit through a detailed explanation of graph structures, subject-predicate-object triples, the use of URIs as identifiers, or any of the other traditional cruft that obscures the capabilities of semantic web technology under a morass of unnecessary detail. (Imagine introducing relational databases by first forcing people to understand index paging mechanisms, or learning to cook via an explanation of organic chemistry). The audience seemed to appreciate it.

[1] I struggled with this earlier over in /2009/03/linked-data-end-user-applications.html

[2] The translation from Keynote to Powerpoint to Google docs was not without problems. And you will definitely need to click through and get a larger version to read some of the screens.

You should follow me on twitter here.