web page hit counter

Wednesday, December 13, 2006

Amazon S3

It's about four months late, but this weekend I decided to try out Amazon's Elastic Compute Cloud with an eye towards moving off the hosted server I use for DayTripr. For those living in a cave, EC2 lets you upload virtual machine images to Amazon, where they will be run on Amazon's massive compute farm. EC2 will revolutionize the world, is both a floor wax and dessert topping, and will render traditional hosting services obsolete any minute now.

You'll notice that the title of this post is not "Amazon EC2", however. That's because the Beta program is currently full. Darnit. But to use EC2 effectively, you've got to understand Amazon's Simple Storage System so I decided to spend a few hours playing with S3 as a consolation prize.

I'm eventually do a full writeup over on Distributopia but for now I've just got a few comments:
  • The S3 SOAP interface is largely worthless. Amazon made the extremely dubious decision to go with the long-abandonded Microsoft-only[1] DIME format for SOAP attachments. You can include data in the body of a request, but for big files it's best to use attachments. Admittedly, there's a gaping void in the SOAP world where attachments should go, but still, DIME? Sheesh. So use the REST interface.
  • The S3 protocol is conceptually simple, but uses a custom digital-signature setup that is a pain to get right. So unless you just want to play around a bit, use a toolkit. The Java-based toolkit Amazon provides is incomplete (no permissions API), so use a 3rd party library. jets3t was the first one I tried and I was happy with it.
  • The EC2 instances are not persistent. You get some reasonably large amount of virtual disk space, but it's not absolutely guaranteed to always be there. The suggestion is to ship stuff you want to keep over to S3 (they're meant to work as a pair, and the EC2 to S3 bandwidth is free), but if you're using an RDBMS that presents certain problems. Doing hourly backups and shipping those over is one solution, but a very much cooler one is to set up a distributed file system. The reason this is a totally cool solution it that it lets you finally make practical use of that stupid "Advanced File Systems" course you took back in college. Take that non-computer-science majors.
So, I wait patiently for my turn at the Beta trough so I can port over DayTripr and instantly be massively scalable[2]. I'm looking forward to it.

[1] Ok, so DIME isn't exactly MS-only. I know, I implemented DIME support for GLUE. But it might as well be. And besides, even MS has abandoned it. For those who like pain, here's MTOM, the latest in a long line of misbegotten specs from those lovable professional spec-writers at the W3C.

[2] Not.


tags: ,,,

Labels: ,

You should follow me on twitter here.

3 Comments:

Anonymous Anonymous said...

Christopher,

I'm the EC2 Beta Troll. If you are interested in gaining admission, spin around three times, click your heals together, and send me a note at buhr [at] amazin dot com.

Cheers,

Martin :)

9:44 AM  
Anonymous Anonymous said...

I'm using EC2 w/ S3. The only "database" I'm running right now is a subversion repository. I do periodic backups of that thing to S3 and it works real nice. Had to do a little tinkering (a lot of tinkering) on top of the Amazon tools but it's pretty cool. While blowing the dust off your college textbooks might be fun in its own right, my take on the EC2 forums is that it will be "hard" to do that. I'm opting for good old fashioned backups (to S3).

1:34 PM  
Anonymous Anonymous said...

Dang, the beta troll beat me to the punch. He's just too fast.

Note that you want to send the email to amazon dot com, not the other word he used.

3:28 PM  

Post a Comment

<< Home