web page hit counter

Sunday, February 22, 2009

Speech Synthesis on the iPhone with Flite



I was planning to use the Cocoa NSSpeechSynthesizer class to do text-to-speech (TTS) in a Linked Data based iPhone application, but Apple evidently isn't interested in my pitiful wants: the very slick OS X speech synthesis capabilities don't appear to be supported in iPhone OS. Not to be deterred, I managed to get the Flite speech synthesizer up and running on the phone. It had already been ported (multiple times) to the ARM architecture[1], so all I had to do was import the source files into Xcode.

Well, pretty much. Other than the tedium of importing all the files, there were a couple of things that needed figuring out:

I added the User-Defined Setting CST_AUDIO_NONE to turn off Flite's ability to play audio directly[2]. Alternatively, you could probably add a new implementation of the sound playing functions that used the iPhone audio framework, but that's too much work for a quick hack.

Since I just wanted a simple proof-of-concept I called the main routine[3] of the command-line "flite" utility directly from my application code:
int play_message( char* msg_file, char* msg ) {
char* argv[5];
argv[0] = "flite";
argv[1] = "-t";
argv[2] = msg;
argv[3] = "-o";
argv[4] = msg_file;
return flite_main( 5, argv );
}
I was torn over whether to set up an in-memory buffer for the sound data or write it to a file then play the file back immediately. Since flite_main() was already set up to write to a file I took the easy way out. The only mildly tricky thing was finding someplace where I could write the temporary file. Some cut and paste from the SpeakHere[4] sample application, and:
NSArray *filePaths = NSSearchPathForDirectoriesInDomains (
NSDocumentDirectory,
NSUserDomainMask,
YES);
NSString *recordingDirectory = [filePaths objectAtIndex: 0];
NSString *tempFilePath = [NSString
stringWithFormat: @"%@/%s",
recordingDirectory, "recording.wav"];
char *path = [ tempFilePath UTF8String];
play_message( path,
"mister watson, come here, i need you." );
That got the sound file written out. To play it back I pointed AVAudioPlayer[5] at the newly generated sound file:
NSError *err;
AVAudioPlayer* audioPlayer = [[AVAudioPlayer alloc]
initWithContentsOfURL:[NSURL fileURLWithPath:tempFilePath]
error:&err];
[audioPlayer setDelegate:self];
[audioPlayer prepareToPlay];
BOOL plays = [audioPlayer play];
I couldn't get AVAudioPlayer working without manually copying over the AVFoundation framework from /Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS2.2.1.sdk/System/Library/Frameworks/AVFoundation.framework. I'm not sure that's the right thing to do, but it worked, and my iPhone now says "mister watson, come here, i need you" with an old-school Cylon accent.

[1] And to at least one framework useful on jailbroken phones. This thread seemed useful if you want to go that direction. I suspect other people must have gotten Flite working in Xcode as above, but I wasn't seeing anything on the first dozen pages of search results.
[2] CST_AUDIO_NONE is a standard Flite build flag. In Xcode, defining CST_AUDIO_NONE as a User-Defined Setting for a Target adds -DCST_AUDIO_NONE to the command line of the compiler. I prefer a good textual Makefile to the annoying graphical interface, but you take the bad with the good.
[3] After re-naming it to "flite_main()" to avoid a clash with the main routine of the iPhone application. There were actually a bunch of little fiddly bits along those lines, but I didn't document them as I went along and you'll have to figure them out on your own.
[4] SpeakHere is at http://developer.apple.com/iphone/library/samplecode/SpeakHere/index.html, but you probably need to be logged in to the iPhone Dev Center to see it.
[5] Again with heavy cut and paste from the SpeakHere sample code.

Labels: , ,

You should follow me on twitter here.

Tuesday, February 03, 2009

OpenSocial: Behind the Corporate Firewall

OpenSocial is an API for consumer social network sites like LinkedIn, Orkut and Hi5, but its greatest value may come from use behind the corporate firewall.

OpenSocial is really two separate things: a portal framework that describes how embedded content can interact with a page, and a glued-on social network API that provides a way to access things like profile and friend data.

The portal framework is just Google's iGoogle framework rebranded, and the social-network API is a least-common denominator mashup of the APIs from a selection of big commercial social networks combined a dash of influence from the Facebook API.

Both the portal framework and the social API are ok, but neither one is really best-of-breed. There are many widely deployed, battle tested portal APIs available (JSR 168 and Sharepoint spring to mind), and on the social side Facebook's API is (at least for the moment) superior.

But in this case, I'd argue worse is better.

The OpenSocial gadget API was designed to be used by web developers rather than corporate IT drones. Authoring a simple OpenSocial gadget is no harder than writing a web page, and the technology is nearly identical. It is much faster to get started with OpenSocial than to learn to program for a traditional corporate portal[1].

The social API is important, too. ERP systems (to optimize the use of your company's stuff) and CRM systems (to optimize the use of your company's customers) are important, but most companies claim their people are their most important resource[2]. Anybody who's spent time on Facebook or Twitter probably buys that there are advantages to including social network features in enterprise applications[3]. OpenSocial, as a well-documented, free-to-implement standard is an obvious choice.

I've been spouting some of this stuff whenever I got the chance, so it was encouraging to see Atlassian (darling of the developer tools world) say something similar at their annual conference[4].

I have a selfish reason for wanting OpenSocial behind the corporate firewall: I'd like Praxis Bridge to have a way to keep in touch with students after the course is over, and I'd like the students to have a way to keep in touch with each other. A silo'ed social network for a limited-time event is useful, but a way to hook into the user's everyday working social network is much more valuable. And since the OpenSocial architecture allows container administrators control over the information leaked to external gadgets, I think it would have a shot at getting past (justifiably) paranoid corporate gatekeepers. But that's a whole 'nother blog post.

[1] Or, well, it seems that way, which is the same. Isn't the whole "worse is better" thing annoying?

[2] On a recruiting brochure you can probably assume a line like "people are our most important asset" is just the standard BS, but when a company's 10-Q says it you can assume they're serious.

[3] ERP systems have human resources modules, but that's not quite the same thing as being "social network enabled". On the other hand, the ERP vendors are going to start buying up "behind the firewall social network" vendors just as soon as the downturn ends, and at that point OpenSocial becomes just another standard ERP module.

[4] OpenSocial comes in in the keynote at about 9:45. There's a whole session about their new plugin architecture at http://blogs.atlassian.com/news/2009/01/atlascamp_video_1.html. Thanks to Tracy Snell for the pointer.

You should follow me on twitter here.