web page hit counter

Sunday, February 22, 2009

Speech Synthesis on the iPhone with Flite



I was planning to use the Cocoa NSSpeechSynthesizer class to do text-to-speech (TTS) in a Linked Data based iPhone application, but Apple evidently isn't interested in my pitiful wants: the very slick OS X speech synthesis capabilities don't appear to be supported in iPhone OS. Not to be deterred, I managed to get the Flite speech synthesizer up and running on the phone. It had already been ported (multiple times) to the ARM architecture[1], so all I had to do was import the source files into Xcode.

Well, pretty much. Other than the tedium of importing all the files, there were a couple of things that needed figuring out:

I added the User-Defined Setting CST_AUDIO_NONE to turn off Flite's ability to play audio directly[2]. Alternatively, you could probably add a new implementation of the sound playing functions that used the iPhone audio framework, but that's too much work for a quick hack.

Since I just wanted a simple proof-of-concept I called the main routine[3] of the command-line "flite" utility directly from my application code:
int play_message( char* msg_file, char* msg ) {
char* argv[5];
argv[0] = "flite";
argv[1] = "-t";
argv[2] = msg;
argv[3] = "-o";
argv[4] = msg_file;
return flite_main( 5, argv );
}
I was torn over whether to set up an in-memory buffer for the sound data or write it to a file then play the file back immediately. Since flite_main() was already set up to write to a file I took the easy way out. The only mildly tricky thing was finding someplace where I could write the temporary file. Some cut and paste from the SpeakHere[4] sample application, and:
NSArray *filePaths = NSSearchPathForDirectoriesInDomains (
NSDocumentDirectory,
NSUserDomainMask,
YES);
NSString *recordingDirectory = [filePaths objectAtIndex: 0];
NSString *tempFilePath = [NSString
stringWithFormat: @"%@/%s",
recordingDirectory, "recording.wav"];
char *path = [ tempFilePath UTF8String];
play_message( path,
"mister watson, come here, i need you." );
That got the sound file written out. To play it back I pointed AVAudioPlayer[5] at the newly generated sound file:
NSError *err;
AVAudioPlayer* audioPlayer = [[AVAudioPlayer alloc]
initWithContentsOfURL:[NSURL fileURLWithPath:tempFilePath]
error:&err];
[audioPlayer setDelegate:self];
[audioPlayer prepareToPlay];
BOOL plays = [audioPlayer play];
I couldn't get AVAudioPlayer working without manually copying over the AVFoundation framework from /Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS2.2.1.sdk/System/Library/Frameworks/AVFoundation.framework. I'm not sure that's the right thing to do, but it worked, and my iPhone now says "mister watson, come here, i need you" with an old-school Cylon accent.

[1] And to at least one framework useful on jailbroken phones. This thread seemed useful if you want to go that direction. I suspect other people must have gotten Flite working in Xcode as above, but I wasn't seeing anything on the first dozen pages of search results.
[2] CST_AUDIO_NONE is a standard Flite build flag. In Xcode, defining CST_AUDIO_NONE as a User-Defined Setting for a Target adds -DCST_AUDIO_NONE to the command line of the compiler. I prefer a good textual Makefile to the annoying graphical interface, but you take the bad with the good.
[3] After re-naming it to "flite_main()" to avoid a clash with the main routine of the iPhone application. There were actually a bunch of little fiddly bits along those lines, but I didn't document them as I went along and you'll have to figure them out on your own.
[4] SpeakHere is at http://developer.apple.com/iphone/library/samplecode/SpeakHere/index.html, but you probably need to be logged in to the iPhone Dev Center to see it.
[5] Again with heavy cut and paste from the SpeakHere sample code.

Labels: , ,

You should follow me on twitter here.

34 Comments:

Anonymous Anonymous said...

Nice. This was on my long list of iPhone projects. Your success will push my TTS project higher up my list.

9:15 AM  
Blogger XML Aficionado said...

Would you mind sharing the sources as a gzip?

11:03 AM  
Anonymous Anonymous said...

I'm really struggling trying to get this to build in XCode, can you either post your sample project, or give some more detail on how to get it to build?

Thanks a ton!

3:57 PM  
Blogger cks said...

I started with an empty iPhone project. I created a folder hierarchy (in Xcode) for the sources, starting with an include file for the headers and a util directory for the util stuff. I initially imported files one at a time, using error messages to guide me as to what was needed. It turns out you can get most of the utils stuff compiling pretty easily (but note that there are a subset of the Flite files that are platform specific and should be left out). After I got the majority of the utils stuff building, I wrote some small test applications and actually transferred them to the phone. I seem to remember one did socket access, and another did some string manipulation. At least one of the util files has references outside the utils directory, however, and at that point I could compile but not link. I gradually added folders and files from the source until the whole thing compiled and a test program linked. I didn't have to make any significant changes to the source (but note the build flags I mention in the blog post). It was tedious, boring, took a couple hours, but wasn't particularly hard. Unfortunately, I also made a mess of the project and the file locations, so the project I'm actually using is not really fit to be distributed. If I ever get around to cleaning it up (and that won't be soon), I may make up a distributable package, but until then, you're all on your own, sorry.

6:40 PM  
Anonymous Anonymous said...

Not sure where you included that CST_AUDIO_NONE build flag in your code?

If the flite directories and source files arent confusing enough, I've had a hard time trying to include all the the necessary files, and figure out what twiddling is needed (as per [3]).

I seem to have made the xcode project able to build and distribute to iphone OS 2.2.1, however, the only text that is synthesized in the .wav example (with AVAudioPlayer) are numbers!

I can't get a text string to read.

At first, I had not included the files such as "cmu_lex_num_bytes.c" or "cmu_lex_data_raw.c" because they yield errors due to being simply a list of values (that I then hard-coded into the other 'cmulex'.c files that made reference to importing the list of values.

However, this did not make a textual string read either.

Is there something else major we're missing? We could not find the correct way to use that CST_AUDIO_NONE build flag in the xcode project.

Does your procedure require a command line build/make of the .tar source from flite before importing to xcode project?

We are relative n00bs, if you can't already tell. But, we are making significant strides quickly, and any help would be appreciated. Thanks!

3:29 PM  
Anonymous Anonymous said...

Thanks, the port worked perfectly. I may be posting the source on Google Code soon!!

6:20 PM  
Anonymous Anonymous said...

Hi, I have downloaded flite-1.3-release from http://www.speech.cs.cmu.edu/flite/download.html. I want to use this package in Xcode iphone development. I just included folder to my project folder, but am getting lots of error. Can anyone explain me how to use this API to make application for iphone?

Niketa

3:23 AM  
Anonymous Anonymous said...

Hello,
Can you share the XCode project here, please?

Or give the link to google code project?

11:23 AM  
Blogger Unknown said...

Hey it worked! Only took an hour.
Thanks

12:59 PM  
Blogger Niki said...

This comment has been removed by the author.

5:37 AM  
Anonymous Anonymous said...

hi,
Actually m not able to include flite properly. I dont know which files from flite folder -that i have downloaded- have to include in iphone application.

If anyone knows then please tell.

5:38 AM  
Anonymous Anonymous said...

hi brian,
if you have done successfully then can share the up to end steps or code for text to speech conversion for iphone ?

6:14 AM  
Anonymous Anonymous said...

Does this also work in the reverse. Voice to text. I am looking for sub app to allow phone to drop text by voice into application

12:39 PM  
Blogger Unknown said...

hi,
I am struggling very hard to make this up. Can anybody share the code or list out the steps to make this work. This will help out a lot of developers like me working on similar projects...
Thanks a lot.

11:06 PM  
Blogger Nomar said...

If someone has this working and would upload their xcode project, I think a lot of people could benefit

5:00 PM  
Anonymous Anonymous said...

Hi Christopher,

I got flite to work on the iPhone.
I also developed a module to convert the flite wave straight in an NSData object in .wav format. Flite does TTS on the fly and speaks without writing to a file.

See www.voxtrek.com to see why I needed flite!

Thanks for the inspiration.
Cheers, Yannick

3:50 PM  
Anonymous Anonymous said...

Hi Yannick,

would it be possible for you to share your XCode project with all the improvements that you made?

I am having a hard time to get this to work and I am sure that your knowledge would benefit several other developers as well.

Thanks.

12:31 PM  
Blogger Sam said...

Hi. I've used the technique described in this blog post to create a simple text to speech iPhone application using Flite. I posted the source code (Xcode bundle) at my web page for other developers who are trying to figure this out - http://cmang.org/ - TTS (Text-to-Speech). It does compile with some (a lot) of warnings related to signed/unsigned mis-matches - a lot of the code can probably be completely taken out. But it works fine generating the .wav to the filesystem and playing it back (and then removing it). I also started to work on a method to have Flite write the raw wave data into a struct in memory so that it can be encapsulated in NSData as described in a post above, but it this part is incomplete and may be indefinitely. Thanks to everyone who's been working on this and posted in this thread. -Sam Foster/cmang

2:21 AM  
Blogger cks said...

Sammy: Nice! Thanks for publishing a project. I'll point people in your direction when they ask.

6:40 AM  
Anonymous Anonymous said...

Can any one help me with sample code , input is text and out put is wav or amy file ( in short text to voice) i am not able to compile the source available and i am new to xcode

2:20 AM  
Blogger Praveen Sharma said...

@Sammy, CKS - I'll highly appreciate if you can please share the xcode project. I am sure many other will also get the benefit and appreciate your help.

1:13 AM  
Anonymous Anonymous said...

What if I just need to speak numbers (or dates and times)?

Has anyone seen any iphone source-code for that?

(I assume I would start with a bunch of individual sound files, each containing 1 number.)

9:18 AM  
Anonymous Anonymous said...

Is there any way for Speech Recognizer?

2:29 AM  
Anonymous Anonymous said...

would calling the command line as you are doing instead of using a library pass apple's appstore checks? or is it not allowed?

5:46 PM  
Blogger Andrew said...

How can you compile for the ARM architecture?

3:31 PM  
Blogger Unknown said...

I'm having trouble building the flite library too..

Can anyone upload their xcodeproject??? Or even just a
brief overview of how they did it??

4:20 PM  
Blogger Sam said...

Hi again. For those of you asking for an xcodeproject, I tried to give mine a more permanent home on bitbucket.

It's also up for now at cmang.org. -Sammy

3:56 AM  
Blogger andreasv said...

Great post. Thanks for everything!

First of all i have to state that i am new to iPhone platform.

My problem is that i am trying to use sam's code as the initial concept to create a new xcode project from scratch. I would like to know if anybody can help.

These are the steps i followed:

Created a new xcode project

In the targets in Compile Resources i added all the .c files i found in sam's project with also selecting copying them to my project

In the targets in Copy Bundle Resources i copied the 4 .txt files sam's has in his project

I also added AVFoundation.framework

I added a folder flite in other resources and in subfolder include I placed all the header files.

In subfolder src i included all .c files of sam's project.

I compiled and received almost 600 warnings.

Sam's project shows these warnings too. As i can see we cannot eliminate those warnings so we are ignoring them.

I had one build error in cst_string.c:

// Iphone destroy the universe:
unsigned char *cst_strdup(const unsigned char *str)
//char *cst_strdup(const char *s) {
{
unsigned char *nstr = NULL;

if (str)
{
nstr = cst_alloc(unsigned char,strlen((const char *)str)+1);
memmove(nstr,str,strlen((const char *)str)+1);
}
return nstr;
}

where i had to correct the declaration of this function because it did not much the one i pasted here.

After that everything compiled ok. Unfortunately i cannot hear the sound or make it work as in sam's project.

Maybe i have to place some linking properties for the library i do not know where? I tried to find out from your Build Project Properties but with no luck. What am i missing?


Moreover can anybody explain where i have to place build flags. I am not sure where to look and change


Thanks anw!

2:18 PM  
Blogger Sam said...

@andreasv and anyone else, the version at bitbucket was reworked into a library class. It should be a lot easier to use now.

The next goal is use NSData memory instead of disk space for writing and reading the wave data, to make it cheaper (and faster?). :P

3:23 AM  
Anonymous Anonymous said...

You can also use the SDK sold by Acapela Group, with far better voices (but not free): http://www.acapela-group.com/acapela-tts-for-iphone-26-speech-solutions.html
Their API is closed to the NSSpeechSynthesizer API

3:18 AM  
Anonymous manutencao iphone said...

 The 4.0 is coming out in ''summer'' but I can't find any specific date. Any hot tips? Wait for 4.0 official? Go to Akihabara or the Apple Store in Ginza and see if anyone can update it to the 4.0 Beta and then jailbreak? Is it possible? Are there any holes in my thinking? I really want to be able to use this thing!

8:12 PM  
Anonymous Emy said...

Espeak Engine is Also good for tts
http://crunchmodo.com/text-to-speech/

6:03 AM  
Blogger Unknown said...

Thank you for sharing your 'Insight'. As a Blogger and I've found it this blog post more informative and presenting the end result. Wordpress Development Singapore

2:04 AM  
Blogger Jack Stark said...

Comparison Charts are best to use when you are looking for something specific, say in case you need to buy something, this chart will help you look at their advantages, disadvantages, prices, reviews, features, etc. Thus, making it easier to choose between them easily. comparison chart in excel.

1:00 AM  

Post a Comment

<< Home