Mini Tutorial: How to capture video of iPhone app in Cocos2D? with audio


Okay, so I figured out how to add audio to my video.

In my previous blog post (, I managed to take a video of my app and save it into a file. However I am just stringing together screenshots of my app taken at every 0.1 second, so it doesn’t capture the audio.

So I have a different function that is capturing my audio (AVAudioRecorder), and saving that into a file.

Now, to combine the files together. Since iOS 4.1, AVFoundation included this thing called AVMutableComposition, and with that you can make composites of stuff, like combine video and audio files together to make a new video file that has audio.

So code bits (I found bits of the code in StackOverflow):

-(void) processVideo: (NSURL*) videoUrl
    AVURLAsset* videoAsset = [[AVURLAsset alloc]initWithURL: videoUrl options:nil];
    AVMutableComposition* mixComposition = [AVMutableComposition composition];
    AppDelegate *appDelegate = (AppDelegate *)[[UIApplication sharedApplication] delegate];
    NSError * error = nil;
    for (NSMutableDictionary * audioInfo in appDelegate.audioInfoArray)
        NSString *pathString = [[NSHomeDirectory() stringByAppendingString:@”/Documents/”] stringByAppendingString: [audioInfo objectForKey: @”fileName”]];
        AVURLAsset * urlAsset = [AVURLAsset URLAssetWithURL:[NSURL fileURLWithPath:pathString] options:nil];
        AVAssetTrack * audioAssetTrack = [[urlAsset tracksWithMediaType:AVMediaTypeAudio] objectAtIndex:0];
        AVMutableCompositionTrack *compositionAudioTrack = [mixComposition addMutableTrackWithMediaType:AVMediaTypeAudio
                                                                                       preferredTrackID: kCMPersistentTrackID_Invalid];
        NSLog(@”%lf”, [[audioInfo objectForKey: @”startTime”] doubleValue]);
        CMTime audioStartTime = CMTimeMake(([[audioInfo objectForKey: @”startTime”] doubleValue]*TIME_SCALE), TIME_SCALE);
        [compositionAudioTrack insertTimeRange:CMTimeRangeMake(kCMTimeZero,urlAsset.duration) ofTrack:audioAssetTrack atTime:audioStartTime error:&error];     
    AVMutableCompositionTrack *compositionVideoTrack = [mixComposition addMutableTrackWithMediaType:AVMediaTypeVideo
    [compositionVideoTrack insertTimeRange:CMTimeRangeMake(kCMTimeZero, videoAsset.duration)
                                   ofTrack:[[videoAsset tracksWithMediaType:AVMediaTypeVideo] objectAtIndex:0]
                                    atTime:kCMTimeZero error:nil];
    AVAssetExportSession* _assetExport = [[AVAssetExportSession alloc] initWithAsset:mixComposition
    NSString* videoName = @””;
    NSString *exportPath = [[self pathToDocumentsDirectory] stringByAppendingPathComponent:videoName];
    NSURL    *exportUrl = [NSURL fileURLWithPath:exportPath];
    if ([[NSFileManager defaultManager] fileExistsAtPath:exportPath])
        [[NSFileManager defaultManager] removeItemAtPath:exportPath error:nil];
    _assetExport.outputFileType = @””;
    NSLog(@”file type %@”,_assetExport.outputFileType);
    _assetExport.outputURL = exportUrl;
    _assetExport.shouldOptimizeForNetworkUse = YES;
    [_assetExport exportAsynchronouslyWithCompletionHandler:
     ^(void ) {
         switch (_assetExport.status)
             case AVAssetExportSessionStatusCompleted:
                 //export complete
                 NSLog(@”Export Complete”);
                 //[self uploadToYouTube];
             case AVAssetExportSessionStatusFailed:
                 NSLog(@”Export Failed”);
                 NSLog(@”ExportSessionError: %@”, [_assetExport.error localizedDescription]);
                 //export error (see exportSession.error) 
             case AVAssetExportSessionStatusCancelled:
                 NSLog(@”Export Failed”);
                 NSLog(@”ExportSessionError: %@”, [_assetExport.error localizedDescription]);
                 //export cancelled 

I have more than one audio file that I want to combine with my video, so I created a array file that contains information for each of the audio files (such as where the file is located and when to play that audio).

And that’s it 🙂 You have a video of your app 🙂 with audio 🙂


Making that Talking App!


So you know Talking Tom, you know, the cat on the iPhone that you can tickle, hit, and repeats your voice in this squeaky voice? And his friends…

So I wrote a series of tutorials months ago about how to make your own Talking app.

Check out the tutorial series:

And a couple of people have asked me for the sample project. Particularly about how Dirac. Since my previous project was work related, I obviously can’t give them that, so I sat down today, and whipped up a really simple sample project, which you guys can just copy the Dirac setup off.

It includes all the stuff mentioned in the tutorials, such as how to record the player’s voice, how to monitor when the player is talking and when the player stopped talking, and how to process the recorded audio using Dirac and play it.

Here’s a screenie of the project:

No, it doesn’t have a fluffy animal, and when you test it out, you’d find out that it doesn’t even talk like a chipmunk. Whoever can guess why, gets a, I don’t know, virtual pat on the head? 

To make it sound like a chipmunk, just adjust the pitch (the value is from 0-2, 0 being like really low voice, and 2 being well, chipmunky).

Link to the project:

I’m providing the project as is, oh, and I only got to test on my Macbook Pro, because i don’t actually own a iOS device. So please test it for me? 

For questions, comments, and answer to why there is a red bowtie, just tweet, email of Facebook me, info is on my sidebar.

While you guys are at it, why don’t you check out some new iOS and books?


According to ManiacDev:

You may also need to tweak the threshold parameter in to get the recording working properly due to differences between the simulator and different iOS devices.

Thanks for pointing it out 🙂

He also mentioned:

Overall a pretty cool and useful audio effect. While you might have interest in building a Talking Tom app there are many other uses for this – such as changing the tempo of a song, or the key to make it easier to play or speeding up boring instructional audios without making the speaker sound like a chipmunk.

Tutorial: The step two to making a ‘Talking’ iPhone app, when to record and when to stop recording


This post is related to the following posts:

The ‘Talking’ app, you say something and an animal repeats what you say in a cute voice.

Well, we can’t really ask the player to tap the animal to make it record, we want the animal to simply record something when the player say something, and then stop recording when the player stopped talking, and then play it. So how do we detect if the player stopped talking?

How to start recording when detecting sound, and stop recording when detect silence?

From Stack Overflow:

Perhaps you could use the AVAudioRecorder’s support for audio level metering to keep track of the audio levels and enable recording when the levels are above a given threshold. You’d need to enable metering with:

[anAVAudioRecorder setMeteringEnabled:YES];

and then you could periodically call:

[anAVAudioRecorder updateMeters];
= [anAVAudioRecorder averagePowerForChannel:0];
if (power > threshold && anAVAudioRecorder.recording==NO) {
[anAVAudioRecorder record];
} else if (power < threshold && anAVAudioRecorder.recording==YES) {
[anAVAudioRecorder stop];

Or something like that.


According to the API, averagePowerForChannel returns the average power of the sound being recorded. If it returns 0 that means that recording is at its full scale, the maximum power (like when someone shouts really really loudly into the mic?), while -160 is the minimum power or near silence (which is what we want right, near silence?).

Another tutorial (Tutorial: Detecting When a User Blows into the Mic by Dan Grigsby), you can also use peakPowerForChannel. He made an algorithm to get the lowPassResults of the audio input:

From the tutorial:

Each time the timer’s callback method is triggered the lowPassResults level variable is recalculated. As a convenience, it’s converted to a 0-1 scale, where zero is complete quiet and one is full volume.

We’ll recognize someone as having blown into the mic when the low pass filtered level crosses a threshold. Choosing the threshold number is somewhat of an art. Set it too low and it’s easily triggered; set it too high and the person has to breath into the mic at gale force and at length. For my app’s need, 0.95 works.

- (void)listenForBlow:(NSTimer *)timer {
[recorder updateMeters];

const double ALPHA = 0.05;
double peakPowerForChannel = pow(10, (0.05 * [recorder peakPowerForChannel:0]));
lowPassResults = ALPHA * peakPowerForChannel + (1.0 - ALPHA) * lowPassResults;

if (lowPassResults > 0.95)
NSLog(@"Mic blow detected");



So I am using this Dan’s algorithm, except the the threshold number, I’m still testing it out, it really is somewhat of an art.

Okay, now we know when the player STOPS talking, what about when the user starts talking? We wouldn’t be able to know that since we stopped recording after the player stops talking, right? We won’t be able to get the power for the channel with a stopped recorder.

And StackOverflow comes to the rescue again, I read somewhere that you should have TWO AVAudioRecorders, instead of ONE. One AVAudioRecorder to monitor the power for channel at all times and one to actually record your player’s voice.

So we have:

NSURL *monitorTmpFile;
NSURL *recordedTmpFile;
AVAudioRecorder *recorder;
AVAudioRecorder *audioMonitor;

And some booleans to keep track of when it is recording or playing:

BOOL isRecording;
BOOL isPlaying;

We have to initialize both controllers, somewhere in your init add:

[self initAudioMonitor];
[self initRecorder];

The functions:

-(void) initAudioMonitor
{    NSMutableDictionary* recordSetting = [[NSMutableDictionary alloc] init];
    [recordSetting setValue :[NSNumber numberWithInt:kAudioFormatAppleIMA4] forKey:AVFormatIDKey];
    [recordSetting setValue:[NSNumber numberWithFloat:44100.0] forKey:AVSampleRateKey];
    [recordSetting setValue:[NSNumber numberWithInt: 1] forKey:AVNumberOfChannelsKey];
    NSArray* documentPaths = NSSearchPathForDirectoriesInDomains(NSDocumentDirectory, NSUserDomainMask, YES);
    NSString* fullFilePath = [[documentPaths objectAtIndex:0] stringByAppendingPathComponent: @”monitor.caf”];
    monitorTmpFile = [NSURL fileURLWithPath:fullFilePath];
    audioMonitor = [[ AVAudioRecorder alloc] initWithURL: monitorTmpFile settings:recordSetting error:&error];
    [audioMonitor setMeteringEnabled:YES];
    [audioMonitor setDelegate:self];
    [audioMonitor record];

-(void) initRecorder
{    NSMutableDictionary* recordSetting = [[NSMutableDictionary alloc] init];
    [recordSetting setValue :[NSNumber numberWithInt:kAudioFormatAppleIMA4] forKey:AVFormatIDKey];
    [recordSetting setValue:[NSNumber numberWithFloat:44100.0] forKey:AVSampleRateKey];
    [recordSetting setValue:[NSNumber numberWithInt: 1] forKey:AVNumberOfChannelsKey];
    NSArray* documentPaths = NSSearchPathForDirectoriesInDomains(NSDocumentDirectory, NSUserDomainMask, YES);
    NSString* fullFilePath = [[documentPaths objectAtIndex:0] stringByAppendingPathComponent: @”in.caf”];
    recordedTmpFile = [NSURL fileURLWithPath:fullFilePath];
    recorder = [[ AVAudioRecorder alloc] initWithURL: recordedTmpFile settings:recordSetting error:&error];
    [recorder setMeteringEnabled:YES];
    [recorder setDelegate:self];
    [recorder prepareToRecord];

And then we have a function that will be called all the time, to monitor your AVAudioRecorders, call it somewhere in your update:

-(void) monitorAudioController: (ccTime) dt
    {   [audioMonitor updateMeters];
        // a convenience, it’s converted to a 0-1 scale, where zero is complete quiet and one is full volume
        const double ALPHA = 0.05;
        double peakPowerForChannel = pow(10, (0.05 * [audioMonitor peakPowerForChannel:0]));
        double audioMonitorResults = ALPHA * peakPowerForChannel + (1.0 – ALPHA) * audioMonitorResults;
        NSLog(@”audioMonitorResults: %f”, audioMonitorResults);
        if (audioMonitorResults > AUDIOMONITOR_THRESHOLD)
        {    NSLog(@”Sound detected”);
            {   [audioMonitor stop];
                [self startRecording];
        }   else
        {   NSLog(@”Silence detected”);
            {   if(silenceTime > MAX_SILENCETIME)
                {   NSLog(@”Next silence detected”);
                    [audioMonitor stop];
                     [self stopRecordingAndPlay];
                    silenceTime = 0;
                }   else
                {   silenceTime += dt;
        if([audioMonitor currentTime] > MAX_MONITORTIME)
        {   [audioMonitor stop];
            [audioMonitor record];

Okay, lemme explain…

You have to call [audioMonitor updateMeters], because (according to AVAudioRecorder class reference):

Refreshes the average and peak power values for all channels of an audio recorder.

And then, do you see Dan’s algorithm?

const double ALPHA = 0.05;
double peakPowerForChannel = pow(10, (0.05 * [audioMonitor peakPowerForChannel:0]));
 double audioMonitorResults = ALPHA * peakPowerForChannel + (1.0 – ALPHA) * audioMonitorResults;

NSLog(@”audioMonitorResults: %f”, audioMonitorResults);

If audioMonitorResults is greater than our threshold AUDIOMONITOR_THRESHOLD (to get this value, requires many hours of testing and monitoring, that’s why I have a NSLog there), that means we have detected sound. And we start recording!

{   [audioMonitor stop];
    [self startRecording];

If it isn’t already recording, we stop the audio monitor and start recording:

-(void) startRecording
{   NSLog(@”startRecording”);
    isRecording = YES;
    [recorder record];

Okay then, if the audioMonitorResults is less than the AUDIOMONITOR_THRESHOLD and we are recording, it means that silence has been detected, but but but, we do not stop the recording at once. Why…? Because when people are speaking, we speak like this: “Hello, how are you?” instead of “Hellohowareyou”, you see the spaces between each word are also detected as silences, which is why:

{   if(silenceTime > MAX_SILENCETIME)
     {   NSLog(@”Next silence detected”);
         [audioMonitor stop];
         [self stopRecordingAndPlay];
         silenceTime = 0;
     }   else
     {   silenceTime += dt;

MAX_SILENCETIME is threshold for the silence time between words.

And then to make sure the size of our audioMonitor output will not explode:

if([audioMonitor currentTime] > MAX_MONITORTIME)
{   [audioMonitor stop];
    [audioMonitor record];

It saves the file after MAX_MONITORTIME.

And then stopRecordingAndPlay:

-(void) stopRecordingAndPlay
{    NSLog(@”stopRecording Record time: %f”, [recorder currentTime]);
    if([recorder currentTime] > MIN_RECORDTIME)
    {   isRecording = NO;
        [recorder stop];
        isPlaying = YES;
        // insert code for playing the audio here
    }   else
    {   [audioMonitor record];

After the audio is played, call:

-(void) stopPlaying
{   isPlaying = NO;
    [audioMonitor record];

And there we go! 🙂

To summarize:

  • 1 AVAudioRecorder to monitor when the player starts talking and stops talking
  • 1 AVAudioRecorder to record the player’s voice
  • Use peakPowerForChannel to detect talking or silence

And that’s about it!

Tutorial: Other ways to chipmunkify your voice


Related to this post: Tutorial: The first step to making a ‘Talking’ iPhone app, chipmunkifying your voice!

There are dozens of “Talking” apps on the iPhone app store, as I’ve mentioned before. Basically what those apps do is, you say something, and then the animal will repeat it, in this chipmunk like voice. But even they are different apps, and they are certainly different animals (hippo, bird, cat, giraffe, duh), some of them share the same voices! Why does that adorable hippo sound like the cat?!

The solution I posted in my previous blog is simply use CocosDenshion to manipulate the recorded voice (your voice), to a higher or lower pitch to produce the voice of the animal (chipmunk). But the flaw of tha solution is that if you change the pitch, you are also changing the speed. So if you lower the pitch, you get this really low voice that is being played really sloooow.

And I don’t want that I want to change the pitch but not change the speed. So I need a different solution. And the solution is Dirac3.

According to its website:

DIRAC redefines the limits of what todays’ technology can do for your application if you want to change the speed and pitch of music independently without sacrificing quality. Used by leading hi-end audio processing applications and custom built solutions in studios around the world, DIRAC is our critically acclaimed time stretching and pitch shifting technology that can be applied both musically monophonic and polyphonic signals with equal ease and success. Its straightforward API makes it an ideal tool for any software project that requires time stretching and pitch shifting.

Basically Dirac allows you to change the pitch of your audio, without speeding it up or slowing it down.

Dirac 3 has a free version, called Dirac LE, which you can simply download from their website: Dirac LE is also available iPhone/iPad ARM 6 and 7 compliant (Xcode, iOS 3.2+ and iOS4+).

Okay, download Dirac LE, and then let’s get started (oh, I am setting up mine as I write this blog post, as well).

According to the “iPhone ReadMe (or suffer).rtf” that came with the zip file, we need to include the vecLib/Accelerate frameworks to your project. Go to Frameworks, right click, add Existing Frameworks, and then look for “Accelerate.framework”, add. Oh, and any file that will contain Dirac calls need to be .mm, instead of .m.

And then Add Exsiting Files, add “Dirac.h” and “libDIRAC_iOS4-fat.a” to your project.

I will be using the Time Stretching Example as my guide, (it’s also in the zip file). Oh, the zip file also contains a 32 page pdf file explaining Dirac.

From the Time Stretching Example, copy the files in ExtAudioFile folder, EAFRead.h,, EAFWrite.h, These are the files Dirac will use to read and write audio files.

And then we create a new file, I’m calling it, take note it’s .mm, because it will be calling Dirac. And basically I just copy pasted most of the code from iPhoneTestAppDelegate of the example. (Guilty for being a copy paster coder).

And then edit some stuff, so AudioProcessor.h:

#import <Foundation/Foundation.h>
#import “EAFRead.h”
#import “EAFWrite.h”

@interface AudioProcessor : NSObject <AVAudioPlayerDelegate>
{    AVAudioPlayer *player;
    float percent;
    NSURL *inUrl;
    NSURL *outUrl;
    EAFRead *reader;
    EAFWrite *writer;

@property (readonly) EAFRead *reader;


And then edit some more stuff,

#include “Dirac.h”
#include <stdio.h>
#include <sys/time.h>

#import <AVFoundation/AVAudioPlayer.h>
#import <AVFoundation/AVFoundation.h>

#import “AudioProcessor.h”
#import “EAFRead.h”
#import “EAFWrite.h”

double gExecTimeTotal = 0.;

void DeallocateAudioBuffer(float **audio, int numChannels)
    if (!audio) return;
    for (long v = 0; v < numChannels; v++) {
        if (audio[v]) {
            audio[v] = NULL;
    audio = NULL;

float **AllocateAudioBuffer(int numChannels, int numFrames)
    // Allocate buffer for output
    float **audio = (float**)malloc(numChannels*sizeof(float*));
    if (!audio) return NULL;
    memset(audio, 0, numChannels*sizeof(float*));
    for (long v = 0; v < numChannels; v++) {
        audio[v] = (float*)malloc(numFrames*sizeof(float));
        if (!audio[v]) {
            DeallocateAudioBuffer(audio, numChannels);
            return NULL;
        else memset(audio[v], 0, numFrames*sizeof(float));
    return audio;

 This is the callback function that supplies data from the input stream/file whenever needed.
 It should be implemented in your software by a routine that gets data from the input/buffers.
 The read requests are *always* consecutive, ie. the routine will never have to supply data out
 of order.
long myReadData(float **chdata, long numFrames, void *userData)
    // The userData parameter can be used to pass information about the caller (for example, “self”) to
    // the callback so it can manage its audio streams.
    if (!chdata)    return 0;
    AudioProcessor *Self = (AudioProcessor*)userData;
    if (!Self)    return 0;
    // we want to exclude the time it takes to read in the data from disk or memory, so we stop the clock until
    // we’ve read in the requested amount of data
    gExecTimeTotal += DiracClockTimeSeconds();        

// ……………………….. stop timer ……………………………………
    OSStatus err = [Self.reader readFloatsConsecutive:numFrames intoArray:chdata];

// ……………………….. start timer ……………………………………
    return err;

@implementation AudioProcessor

@synthesize reader;

    NSError *error = nil;

    player = [[AVAudioPlayer alloc] initWithContentsOfURL:outUrl error:&error];
    if (error)
        NSLog(@”AVAudioPlayer error %@, %@”, error, [error userInfo]);
    player.delegate = self;
    [player play];

    NSAutoreleasePool *pool = [[NSAutoreleasePool alloc] init];
    long numChannels = 1;        // DIRAC LE allows mono only
    float sampleRate = 44100.;
    // open input file
    [reader openFileForRead:inUrl sr:sampleRate channels:numChannels];
    // create output file (overwrite if exists)
    [writer openFileForWrite:outUrl sr:sampleRate channels:numChannels wordLength:16 type:kAudioFileAIFFType];   
    // DIRAC parameters
    // Here we set our time an pitch manipulation values
    float time      = 1.0;
    float pitch     = 1.0;
    float formant   = 1.0;
    // First we set up DIRAC to process numChannels of audio at 44.1kHz
    // N.b.: The fastest option is kDiracLambdaPreview / kDiracQualityPreview, best is kDiracLambda3, kDiracQualityBest
    // The probably best *default* option for general purpose signals is kDiracLambda3 / kDiracQualityGood
    void *dirac = DiracCreate(kDiracLambdaPreview, kDiracQualityPreview, numChannels, 44100., &myReadData, (void*)self);
    //    void *dirac = DiracCreate(kDiracLambda3, kDiracQualityBest, numChannels, 44100., &myReadData);
    if (!dirac) {
        printf(“!! ERROR !!nntCould not create DIRAC instancentCheck number of channels and sample rate!n”);
        printf(“ntNote that the free DIRAC LE library supports onlyntone channel per instancennn”);
    // Pass the values to our DIRAC instance    
    DiracSetProperty(kDiracPropertyTimeFactor, time, dirac);
    DiracSetProperty(kDiracPropertyPitchFactor, pitch, dirac);
    DiracSetProperty(kDiracPropertyFormantFactor, formant, dirac);
    // upshifting pitch will be slower, so in this case we’ll enable constant CPU pitch shifting
    if (pitch > 1.0)
        DiracSetProperty(kDiracPropertyUseConstantCpuPitchShift, 1, dirac);
    // Print our settings to the console
    NSLog(@”Running DIRAC version %snStarting processing”, DiracVersion());
    // Get the number of frames from the file to display our simplistic progress bar
    SInt64 numf = [reader fileNumFrames];
    SInt64 outframes = 0;
    SInt64 newOutframe = numf*time;
    long lastPercent = -1;
    percent = 0;
    // This is an arbitrary number of frames per call. Change as you see fit
    long numFrames = 8192;
    // Allocate buffer for output
    float **audio = AllocateAudioBuffer(numChannels, numFrames);
    double bavg = 0;
    for(;;) {
        // Display ASCII style “progress bar”
        percent = 100.f*(double)outframes / (double)newOutframe;
        long ipercent = percent;
        if (lastPercent != percent) {
            //[self performSelectorOnMainThread:@selector(updateBarOnMainThread:) withObject:self waitUntilDone:NO];
            printf(“rProgress: %3i%% [%-40s] “, ipercent, &”||||||||||||||||||||||||||||||||||||||||”[40 – ((ipercent>100)?40:(2*ipercent/5))] );
            lastPercent = ipercent;

// ……………………….. start timer ……………………………………
        // Call the DIRAC process function with current time and pitch settings
        // Returns: the number of frames in audio
        long ret = DiracProcess(audio, numFrames, dirac);
        bavg += (numFrames/sampleRate);
        gExecTimeTotal += DiracClockTimeSeconds();       

// ……………………….. stop timer ……………………………………
        printf(“x realtime = %3.3f : 1 (DSP only), CPU load (peak, DSP+disk): %3.2f%%n”, bavg/gExecTimeTotal, DiracPeakCpuUsagePercent(dirac));
        // Process only as many frames as needed
        long framesToWrite = numFrames;
        unsigned long nextWrite = outframes + numFrames;
        if (nextWrite > newOutframe) framesToWrite = numFrames – nextWrite + newOutframe;
        if (framesToWrite < 0) framesToWrite = 0;
        // Write the data to the output file
        [writer writeFloats:framesToWrite fromArray:audio];
        // Increase our counter for the progress bar
        outframes += numFrames;
        // As soon as we’ve written enough frames we exit the main loop
        if (ret <= 0) break;
    percent = 100;
    //[self performSelectorOnMainThread:@selector(updateBarOnMainThread:) withObject:self waitUntilDone:NO];
    // Free buffer for output
    DeallocateAudioBuffer(audio, numChannels);
    // destroy DIRAC instance
    DiracDestroy( dirac );
    // Done!
    [reader release];
    [writer release]; // important – flushes data to file
    // start playback on main thread
    [self performSelectorOnMainThread:@selector(playOnMainThread:) withObject:self waitUntilDone:NO];
    [pool release];

– (void)audioPlayerDidFinishPlaying:(AVAudioPlayer *)player successfully:(BOOL)flag

– (void) initAudioProcessor: (NSURL*) filePath
{   NSLog(@”initAudioProcessor”);
    //NSString *inputSound  = [[[NSBundle mainBundle] pathForResource:  @”voice” ofType: @”aif”] retain];
    NSString *outputSound = [[[NSHomeDirectory() stringByAppendingString:@”/Documents/”] stringByAppendingString:@”out.aif”] retain];
    inUrl = [filePath retain];
    outUrl = [[NSURL fileURLWithPath:outputSound] retain];
    reader = [[EAFRead alloc] init];
    writer = [[EAFWrite alloc] init];
    // this thread does the processing
    [NSThread detachNewThreadSelector:@selector(processThread:) toTarget:self withObject:nil];

– (void)dealloc
    [player release];
    [inUrl release];
    [outUrl release];
    [super dealloc];

In my AudioController (code in previous blog), create a AudioProcessor object:

AudioProcessor *audioProcessor = [[AudioProcessor alloc] init];
    [audioProcessor initAudioProcessor: [NSString stringWithFormat: @”%@”, recordedTmpFiles[recordedTmpFileIdx]]];

recordedTmpFiles[recordedTmpFileIdx] is the filepath of the recorded audio.

Just adjust float pitch, to change the well, pitch of your voice. For a chipmunky voice, set the pitch to > 1, for a low voice set the pitch to < 1.

And that’s it 🙂

Tutorial: The first step to making a ‘Talking’ iPhone app, chipmunkifying your voice!


Yah, I invented a word, “chipmunkifying”, or to make your voice sound like a chipmunk. Why? ‘Talking’ apps are all over the place in the iPhone app store, there’s a talking tom cat, talking bird, talking hippo, talking giraffe, whatever animal you can think of… Basically what those apps do is, you say something, and then the animal will repeat it, in this chipmunk like voice. Oh, you can poke, tickle, hit it too, or whatever…

Oh, I am using Cocos2D as my game engine. So, begin by creating a project using the Cocos2D application template. If you are not familiar with Cocos2D, you can go to its website for instructions on how to download and install it.

Well, so the first step to a ‘Talking’ app is of course, it has to record what you say.

I’ll be using AVAudioRecorder to record my voice, it’s really simple to set up, just follow the intructions on this blog by Jake Wyman. But he uses the iPhone SDK, while I will be using Cocos2D. So just follow his tutorial up to the part of adding frameworks:

From your XCode interface you are going to select the Frameworks folder, ctl>click and choose ADD and then select Existing Frameworks… Then choose both the CoreAudio.Framework and the AVFoundation.Framework

And after we have done that, some coding… Rather copy paste some code from Jake’s.

First create a NSObject class, named AudioController.

Import AVFoundation and CoreAudio:

#import <AVFoundation/AVFoundation.h>
#import <CoreAudio/CoreAudioTypes.h>

Set up AudioController as a AVAudioRecorderDelegate. And declare an AVAudioRecorder, and a string for recordedTmp (the file path where we will temporarily store our audio).

@interface AudioController : NSObject <AVAudioRecorderDelegate>

{   AVAudioRecorder * recorder;

   NSString *recordedTmpFile;


We instantiate an AVAudioSession in a function called initAudioController (basically the code inside Jake’s viewDidLoad):

– (void) initAudioController
{   //Instanciate an instance of the AVAudioSession object.
    AVAudioSession * audioSession = [AVAudioSession sharedInstance];
    //Setup the audioSession for playback and record.
    //We could just use record and then switch it to playback leter, but
    //since we are going to do both lets set it up once.
    [audioSession setCategory:AVAudioSessionCategoryPlayAndRecord error: &error];
    //Activate the session
    [audioSession setActive:YES error: &error];

And then our record function:

-(void) record
{   NSMutableDictionary* recordSetting = [[NSMutableDictionary alloc] init];
   [recordSetting setValue :[NSNumber numberWithInt:kAudioFormatAppleIMA4] forKey:AVFormatIDKey];
   [recordSetting setValue:[NSNumber numberWithFloat:44100.0] forKey:AVSampleRateKey];
   [recordSetting setValue:[NSNumber numberWithInt: 2] forKey:AVNumberOfChannelsKey];
   recordedTmpFile = [NSURL fileURLWithPath:[NSTemporaryDirectory() stringByAppendingPathComponent: [NSString stringWithString: @”recording.caf”]]];
   recorder = [[ AVAudioRecorder alloc] initWithURL:recordedTmpFile settings:recordSetting error:&error];
   [recorder setDelegate:self];
   [recorder prepareToRecord];      
   [recorder record];

And now, to play whatever recorded in a chipmunk voice. In Jake’s project he uses AVAudioPlayer to play his sound, but that isn’t going to work for me, because AVAudioPlayer doesn’t allow me to change the playback speed.

So instead of using that, I will be using CocosDenshion’s CDSoundEngine. I am reading Chapter 9 : Playing Sounds With CocosDenshion of Cocos2d for iPhone 0.99 Beginner’s Guide:

According to Pablo Ruiz, we need to import more frameworks to get CDSoundEngine working:

… include OpenAL and AudioToolbox frameworks in your project.

More imports:

#import “cocos2d.h”
#import “CocosDenshion.h”

And then declare a CDSoundEngine:

CDSoundEngine *soundEngine;

In initAudioController function, we initialize the soundEngine.

soundEngine = [[CDSoundEngine alloc] init: kAudioSessionCategory_PlayAndRecord];

NSArray *defs = [NSArray arrayWithObjects: [NSNumber numberWithInt:1],nil];
[soundEngine defineSourceGroups:defs];

And then we play:

-(void) play
{   NSString *filePath = [NSTemporaryDirectory() stringByAppendingPathComponent: [NSString stringWithString: @”recording.caf”]];

   [soundEngine loadBuffer: recordedTmpFileIdx filePath: filePath];
   [soundEngine playSound: recordedTmpFileIdx sourceGroupId: 0 pitch: 2.0f pan: 0.0f gain: 1.0f loop: NO];   

Take note of the pitch property: it says 2.0f. What does it mean? The setting for normal pitch is 1.0f, if you increase its value, you get a higher pitch, also known as a chipmunked voice, if you decrease the pitch, you get this low kind of creepy voice.

We also need to make a function for stopping the recording and then start playing:

-(void) stopRecording
{   [recorder stop];
    [self play];

And then we make a function for unloading the AudioController:

– (void) unloadAudioController
{   NSFileManager * fm = [NSFileManager defaultManager];
    [fm removeItemAtPath:[recordedTmpFiles [0] path] error:&error];
[recorder dealloc];
    recorder = nil;
   [soundEngine release];


Okay, now we have AudioController done, it’s time to call it in our HelloWorld scene. Yes. the HelloWorld that comes with the default template. I also added a BOOL isRecording to keep trakc if we are recording or playing.


#import “cocos2d.h”
#import “AudioController.h”

// HelloWorld Layer
@interface HelloWorld : CCLayer
{    AudioController *audioLayer;

     BOOL *isRecording;

+(id) scene;


For HelloWorld.m, in init, add swallowedTouches and change the “Hello World” label to “Say something…” or “Speak!” or “Talk to me” or whatever.

[[CCTouchDispatcher sharedDispatcher] addTargetedDelegate:self priority:0 swallowsTouches:YES];
CCLabel* label = [CCLabel labelWithString:@”Say something…” fontName:@”Marker Felt” fontSize:32];

Also in init, initialize audioLayer, and set isRecording to NO

audioLayer = [[AudioController alloc] init];
[audioLayer initAudioController];

isRecording = NO;

And then, since I am lazy to add buttons, the user simply taps the iPhone, anywhere on the iPhone once, to record and then tap again to stop recording and play the audio.

– (BOOL)ccTouchBegan:(UITouch *)touch withEvent:(UIEvent *)event

{    if(isRecording)

    {    [audioLayer stopRecording];

           isRecording = NO;



    {       [audioLayer record];

            isRecording = YES;



And in HelloWorld’s dealloc add:

[audioLayer unloadAudioController];

And that’s it 🙂

You can record your voice and play to sound like a chipmunk 🙂

For any questions (or if you find any errors), feel free to contact me through Twitter, here, email, Facebook or whatever.