Speech Synthesis on the iPhone with Flite
I was planning to use the Cocoa NSSpeechSynthesizer class to do text-to-speech (TTS) in a Linked Data based iPhone application, but Apple evidently isn't interested in my pitiful wants: the very slick OS X speech synthesis capabilities don't appear to be supported in iPhone OS. Not to be deterred, I managed to get the Flite speech synthesizer up and running on the phone. It had already been ported (multiple times) to the ARM architecture[1], so all I had to do was import the source files into Xcode.
Well, pretty much. Other than the tedium of importing all the files, there were a couple of things that needed figuring out:
I added the User-Defined Setting CST_AUDIO_NONE to turn off Flite's ability to play audio directly[2]. Alternatively, you could probably add a new implementation of the sound playing functions that used the iPhone audio framework, but that's too much work for a quick hack.
Since I just wanted a simple proof-of-concept I called the main routine[3] of the command-line "flite" utility directly from my application code:
int play_message( char* msg_file, char* msg ) {I was torn over whether to set up an in-memory buffer for the sound data or write it to a file then play the file back immediately. Since flite_main() was already set up to write to a file I took the easy way out. The only mildly tricky thing was finding someplace where I could write the temporary file. Some cut and paste from the SpeakHere[4] sample application, and:
char* argv[5];
argv[0] = "flite";
argv[1] = "-t";
argv[2] = msg;
argv[3] = "-o";
argv[4] = msg_file;
return flite_main( 5, argv );
}
NSArray *filePaths = NSSearchPathForDirectoriesInDomains (That got the sound file written out. To play it back I pointed AVAudioPlayer[5] at the newly generated sound file:
NSDocumentDirectory,
NSUserDomainMask,
YES);
NSString *recordingDirectory = [filePaths objectAtIndex: 0];
NSString *tempFilePath = [NSString
stringWithFormat: @"%@/%s",
recordingDirectory, "recording.wav"];
char *path = [ tempFilePath UTF8String];
play_message( path,
"mister watson, come here, i need you." );
NSError *err;I couldn't get AVAudioPlayer working without manually copying over the AVFoundation framework from /Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS2.2.1.sdk/System/Library/Frameworks/AVFoundation.framework. I'm not sure that's the right thing to do, but it worked, and my iPhone now says "mister watson, come here, i need you" with an old-school Cylon accent.
AVAudioPlayer* audioPlayer = [[AVAudioPlayer alloc]
initWithContentsOfURL:[NSURL fileURLWithPath:tempFilePath]
error:&err];
[audioPlayer setDelegate:self];
[audioPlayer prepareToPlay];
BOOL plays = [audioPlayer play];
[2] CST_AUDIO_NONE is a standard Flite build flag. In Xcode, defining CST_AUDIO_NONE as a User-Defined Setting for a Target adds -DCST_AUDIO_NONE to the command line of the compiler. I prefer a good textual Makefile to the annoying graphical interface, but you take the bad with the good.
[3] After re-naming it to "flite_main()" to avoid a clash with the main routine of the iPhone application. There were actually a bunch of little fiddly bits along those lines, but I didn't document them as I went along and you'll have to figure them out on your own.
[4] SpeakHere is at http://developer.apple.com/iphone/library/samplecode/SpeakHere/index.html, but you probably need to be logged in to the iPhone Dev Center to see it.
[5] Again with heavy cut and paste from the SpeakHere sample code.
Labels: engineering, iphone, speech
You should follow me on twitter here.