Mini Tutorial: How to capture video of iPhone app in Cocos2D? with audio

Okay, so I figured out how to add audio to my video.

In my previous blog post (, I managed to take a video of my app and save it into a file. However I am just stringing together screenshots of my app taken at every 0.1 second, so it doesn’t capture the audio.

So I have a different function that is capturing my audio (AVAudioRecorder), and saving that into a file.

Now, to combine the files together. Since iOS 4.1, AVFoundation included this thing called AVMutableComposition, and with that you can make composites of stuff, like combine video and audio files together to make a new video file that has audio.

So code bits (I found bits of the code in StackOverflow):

-(void) processVideo: (NSURL*) videoUrl
    AVURLAsset* videoAsset = [[AVURLAsset alloc]initWithURL: videoUrl options:nil];
    AVMutableComposition* mixComposition = [AVMutableComposition composition];
    AppDelegate *appDelegate = (AppDelegate *)[[UIApplication sharedApplication] delegate];
    NSError * error = nil;
    for (NSMutableDictionary * audioInfo in appDelegate.audioInfoArray)
        NSString *pathString = [[NSHomeDirectory() stringByAppendingString:@”/Documents/”] stringByAppendingString: [audioInfo objectForKey: @”fileName”]];
        AVURLAsset * urlAsset = [AVURLAsset URLAssetWithURL:[NSURL fileURLWithPath:pathString] options:nil];
        AVAssetTrack * audioAssetTrack = [[urlAsset tracksWithMediaType:AVMediaTypeAudio] objectAtIndex:0];
        AVMutableCompositionTrack *compositionAudioTrack = [mixComposition addMutableTrackWithMediaType:AVMediaTypeAudio
                                                                                       preferredTrackID: kCMPersistentTrackID_Invalid];
        NSLog(@”%lf”, [[audioInfo objectForKey: @”startTime”] doubleValue]);
        CMTime audioStartTime = CMTimeMake(([[audioInfo objectForKey: @”startTime”] doubleValue]*TIME_SCALE), TIME_SCALE);
        [compositionAudioTrack insertTimeRange:CMTimeRangeMake(kCMTimeZero,urlAsset.duration) ofTrack:audioAssetTrack atTime:audioStartTime error:&error];     
    AVMutableCompositionTrack *compositionVideoTrack = [mixComposition addMutableTrackWithMediaType:AVMediaTypeVideo
    [compositionVideoTrack insertTimeRange:CMTimeRangeMake(kCMTimeZero, videoAsset.duration)
                                   ofTrack:[[videoAsset tracksWithMediaType:AVMediaTypeVideo] objectAtIndex:0]
                                    atTime:kCMTimeZero error:nil];
    AVAssetExportSession* _assetExport = [[AVAssetExportSession alloc] initWithAsset:mixComposition
    NSString* videoName = @””;
    NSString *exportPath = [[self pathToDocumentsDirectory] stringByAppendingPathComponent:videoName];
    NSURL    *exportUrl = [NSURL fileURLWithPath:exportPath];
    if ([[NSFileManager defaultManager] fileExistsAtPath:exportPath])
        [[NSFileManager defaultManager] removeItemAtPath:exportPath error:nil];
    _assetExport.outputFileType = @””;
    NSLog(@”file type %@”,_assetExport.outputFileType);
    _assetExport.outputURL = exportUrl;
    _assetExport.shouldOptimizeForNetworkUse = YES;
    [_assetExport exportAsynchronouslyWithCompletionHandler:
     ^(void ) {
         switch (_assetExport.status)
             case AVAssetExportSessionStatusCompleted:
                 //export complete
                 NSLog(@”Export Complete”);
                 //[self uploadToYouTube];
             case AVAssetExportSessionStatusFailed:
                 NSLog(@”Export Failed”);
                 NSLog(@”ExportSessionError: %@”, [_assetExport.error localizedDescription]);
                 //export error (see exportSession.error) 
             case AVAssetExportSessionStatusCancelled:
                 NSLog(@”Export Failed”);
                 NSLog(@”ExportSessionError: %@”, [_assetExport.error localizedDescription]);
                 //export cancelled 

I have more than one audio file that I want to combine with my video, so I created a array file that contains information for each of the audio files (such as where the file is located and when to play that audio).

And that’s it 🙂 You have a video of your app 🙂 with audio 🙂

Tutorial: Other ways to chipmunkify your voice

Related to this post: Tutorial: The first step to making a ‘Talking’ iPhone app, chipmunkifying your voice!

There are dozens of “Talking” apps on the iPhone app store, as I’ve mentioned before. Basically what those apps do is, you say something, and then the animal will repeat it, in this chipmunk like voice. But even they are different apps, and they are certainly different animals (hippo, bird, cat, giraffe, duh), some of them share the same voices! Why does that adorable hippo sound like the cat?!

The solution I posted in my previous blog is simply use CocosDenshion to manipulate the recorded voice (your voice), to a higher or lower pitch to produce the voice of the animal (chipmunk). But the flaw of tha solution is that if you change the pitch, you are also changing the speed. So if you lower the pitch, you get this really low voice that is being played really sloooow.

And I don’t want that I want to change the pitch but not change the speed. So I need a different solution. And the solution is Dirac3.

According to its website:

DIRAC redefines the limits of what todays’ technology can do for your application if you want to change the speed and pitch of music independently without sacrificing quality. Used by leading hi-end audio processing applications and custom built solutions in studios around the world, DIRAC is our critically acclaimed time stretching and pitch shifting technology that can be applied both musically monophonic and polyphonic signals with equal ease and success. Its straightforward API makes it an ideal tool for any software project that requires time stretching and pitch shifting.

Basically Dirac allows you to change the pitch of your audio, without speeding it up or slowing it down.

Dirac 3 has a free version, called Dirac LE, which you can simply download from their website: Dirac LE is also available iPhone/iPad ARM 6 and 7 compliant (Xcode, iOS 3.2+ and iOS4+).

Okay, download Dirac LE, and then let’s get started (oh, I am setting up mine as I write this blog post, as well).

According to the “iPhone ReadMe (or suffer).rtf” that came with the zip file, we need to include the vecLib/Accelerate frameworks to your project. Go to Frameworks, right click, add Existing Frameworks, and then look for “Accelerate.framework”, add. Oh, and any file that will contain Dirac calls need to be .mm, instead of .m.

And then Add Exsiting Files, add “Dirac.h” and “libDIRAC_iOS4-fat.a” to your project.

I will be using the Time Stretching Example as my guide, (it’s also in the zip file). Oh, the zip file also contains a 32 page pdf file explaining Dirac.

From the Time Stretching Example, copy the files in ExtAudioFile folder, EAFRead.h,, EAFWrite.h, These are the files Dirac will use to read and write audio files.

And then we create a new file, I’m calling it, take note it’s .mm, because it will be calling Dirac. And basically I just copy pasted most of the code from iPhoneTestAppDelegate of the example. (Guilty for being a copy paster coder).

And then edit some stuff, so AudioProcessor.h:

#import <Foundation/Foundation.h>
#import “EAFRead.h”
#import “EAFWrite.h”

@interface AudioProcessor : NSObject <AVAudioPlayerDelegate>
{    AVAudioPlayer *player;
    float percent;
    NSURL *inUrl;
    NSURL *outUrl;
    EAFRead *reader;
    EAFWrite *writer;

@property (readonly) EAFRead *reader;


And then edit some more stuff,

#include “Dirac.h”
#include <stdio.h>
#include <sys/time.h>

#import <AVFoundation/AVAudioPlayer.h>
#import <AVFoundation/AVFoundation.h>

#import “AudioProcessor.h”
#import “EAFRead.h”
#import “EAFWrite.h”

double gExecTimeTotal = 0.;

void DeallocateAudioBuffer(float **audio, int numChannels)
    if (!audio) return;
    for (long v = 0; v < numChannels; v++) {
        if (audio[v]) {
            audio[v] = NULL;
    audio = NULL;

float **AllocateAudioBuffer(int numChannels, int numFrames)
    // Allocate buffer for output
    float **audio = (float**)malloc(numChannels*sizeof(float*));
    if (!audio) return NULL;
    memset(audio, 0, numChannels*sizeof(float*));
    for (long v = 0; v < numChannels; v++) {
        audio[v] = (float*)malloc(numFrames*sizeof(float));
        if (!audio[v]) {
            DeallocateAudioBuffer(audio, numChannels);
            return NULL;
        else memset(audio[v], 0, numFrames*sizeof(float));
    return audio;

 This is the callback function that supplies data from the input stream/file whenever needed.
 It should be implemented in your software by a routine that gets data from the input/buffers.
 The read requests are *always* consecutive, ie. the routine will never have to supply data out
 of order.
long myReadData(float **chdata, long numFrames, void *userData)
    // The userData parameter can be used to pass information about the caller (for example, “self”) to
    // the callback so it can manage its audio streams.
    if (!chdata)    return 0;
    AudioProcessor *Self = (AudioProcessor*)userData;
    if (!Self)    return 0;
    // we want to exclude the time it takes to read in the data from disk or memory, so we stop the clock until
    // we’ve read in the requested amount of data
    gExecTimeTotal += DiracClockTimeSeconds();        

// ……………………….. stop timer ……………………………………
    OSStatus err = [Self.reader readFloatsConsecutive:numFrames intoArray:chdata];

// ……………………….. start timer ……………………………………
    return err;

@implementation AudioProcessor

@synthesize reader;

    NSError *error = nil;

    player = [[AVAudioPlayer alloc] initWithContentsOfURL:outUrl error:&error];
    if (error)
        NSLog(@”AVAudioPlayer error %@, %@”, error, [error userInfo]);
    player.delegate = self;
    [player play];

    NSAutoreleasePool *pool = [[NSAutoreleasePool alloc] init];
    long numChannels = 1;        // DIRAC LE allows mono only
    float sampleRate = 44100.;
    // open input file
    [reader openFileForRead:inUrl sr:sampleRate channels:numChannels];
    // create output file (overwrite if exists)
    [writer openFileForWrite:outUrl sr:sampleRate channels:numChannels wordLength:16 type:kAudioFileAIFFType];   
    // DIRAC parameters
    // Here we set our time an pitch manipulation values
    float time      = 1.0;
    float pitch     = 1.0;
    float formant   = 1.0;
    // First we set up DIRAC to process numChannels of audio at 44.1kHz
    // N.b.: The fastest option is kDiracLambdaPreview / kDiracQualityPreview, best is kDiracLambda3, kDiracQualityBest
    // The probably best *default* option for general purpose signals is kDiracLambda3 / kDiracQualityGood
    void *dirac = DiracCreate(kDiracLambdaPreview, kDiracQualityPreview, numChannels, 44100., &myReadData, (void*)self);
    //    void *dirac = DiracCreate(kDiracLambda3, kDiracQualityBest, numChannels, 44100., &myReadData);
    if (!dirac) {
        printf(“!! ERROR !!nntCould not create DIRAC instancentCheck number of channels and sample rate!n”);
        printf(“ntNote that the free DIRAC LE library supports onlyntone channel per instancennn”);
    // Pass the values to our DIRAC instance    
    DiracSetProperty(kDiracPropertyTimeFactor, time, dirac);
    DiracSetProperty(kDiracPropertyPitchFactor, pitch, dirac);
    DiracSetProperty(kDiracPropertyFormantFactor, formant, dirac);
    // upshifting pitch will be slower, so in this case we’ll enable constant CPU pitch shifting
    if (pitch > 1.0)
        DiracSetProperty(kDiracPropertyUseConstantCpuPitchShift, 1, dirac);
    // Print our settings to the console
    NSLog(@”Running DIRAC version %snStarting processing”, DiracVersion());
    // Get the number of frames from the file to display our simplistic progress bar
    SInt64 numf = [reader fileNumFrames];
    SInt64 outframes = 0;
    SInt64 newOutframe = numf*time;
    long lastPercent = -1;
    percent = 0;
    // This is an arbitrary number of frames per call. Change as you see fit
    long numFrames = 8192;
    // Allocate buffer for output
    float **audio = AllocateAudioBuffer(numChannels, numFrames);
    double bavg = 0;
    for(;;) {
        // Display ASCII style “progress bar”
        percent = 100.f*(double)outframes / (double)newOutframe;
        long ipercent = percent;
        if (lastPercent != percent) {
            //[self performSelectorOnMainThread:@selector(updateBarOnMainThread:) withObject:self waitUntilDone:NO];
            printf(“rProgress: %3i%% [%-40s] “, ipercent, &”||||||||||||||||||||||||||||||||||||||||”[40 – ((ipercent>100)?40:(2*ipercent/5))] );
            lastPercent = ipercent;

// ……………………….. start timer ……………………………………
        // Call the DIRAC process function with current time and pitch settings
        // Returns: the number of frames in audio
        long ret = DiracProcess(audio, numFrames, dirac);
        bavg += (numFrames/sampleRate);
        gExecTimeTotal += DiracClockTimeSeconds();       

// ……………………….. stop timer ……………………………………
        printf(“x realtime = %3.3f : 1 (DSP only), CPU load (peak, DSP+disk): %3.2f%%n”, bavg/gExecTimeTotal, DiracPeakCpuUsagePercent(dirac));
        // Process only as many frames as needed
        long framesToWrite = numFrames;
        unsigned long nextWrite = outframes + numFrames;
        if (nextWrite > newOutframe) framesToWrite = numFrames – nextWrite + newOutframe;
        if (framesToWrite < 0) framesToWrite = 0;
        // Write the data to the output file
        [writer writeFloats:framesToWrite fromArray:audio];
        // Increase our counter for the progress bar
        outframes += numFrames;
        // As soon as we’ve written enough frames we exit the main loop
        if (ret <= 0) break;
    percent = 100;
    //[self performSelectorOnMainThread:@selector(updateBarOnMainThread:) withObject:self waitUntilDone:NO];
    // Free buffer for output
    DeallocateAudioBuffer(audio, numChannels);
    // destroy DIRAC instance
    DiracDestroy( dirac );
    // Done!
    [reader release];
    [writer release]; // important – flushes data to file
    // start playback on main thread
    [self performSelectorOnMainThread:@selector(playOnMainThread:) withObject:self waitUntilDone:NO];
    [pool release];

– (void)audioPlayerDidFinishPlaying:(AVAudioPlayer *)player successfully:(BOOL)flag

– (void) initAudioProcessor: (NSURL*) filePath
{   NSLog(@”initAudioProcessor”);
    //NSString *inputSound  = [[[NSBundle mainBundle] pathForResource:  @”voice” ofType: @”aif”] retain];
    NSString *outputSound = [[[NSHomeDirectory() stringByAppendingString:@”/Documents/”] stringByAppendingString:@”out.aif”] retain];
    inUrl = [filePath retain];
    outUrl = [[NSURL fileURLWithPath:outputSound] retain];
    reader = [[EAFRead alloc] init];
    writer = [[EAFWrite alloc] init];
    // this thread does the processing
    [NSThread detachNewThreadSelector:@selector(processThread:) toTarget:self withObject:nil];

– (void)dealloc
    [player release];
    [inUrl release];
    [outUrl release];
    [super dealloc];

In my AudioController (code in previous blog), create a AudioProcessor object:

AudioProcessor *audioProcessor = [[AudioProcessor alloc] init];
    [audioProcessor initAudioProcessor: [NSString stringWithFormat: @”%@”, recordedTmpFiles[recordedTmpFileIdx]]];

recordedTmpFiles[recordedTmpFileIdx] is the filepath of the recorded audio.

Just adjust float pitch, to change the well, pitch of your voice. For a chipmunky voice, set the pitch to > 1, for a low voice set the pitch to < 1.

And that’s it 🙂

Tutorial: The first step to making a ‘Talking’ iPhone app, chipmunkifying your voice!

Yah, I invented a word, “chipmunkifying”, or to make your voice sound like a chipmunk. Why? ‘Talking’ apps are all over the place in the iPhone app store, there’s a talking tom cat, talking bird, talking hippo, talking giraffe, whatever animal you can think of… Basically what those apps do is, you say something, and then the animal will repeat it, in this chipmunk like voice. Oh, you can poke, tickle, hit it too, or whatever…

Oh, I am using Cocos2D as my game engine. So, begin by creating a project using the Cocos2D application template. If you are not familiar with Cocos2D, you can go to its website for instructions on how to download and install it.

Well, so the first step to a ‘Talking’ app is of course, it has to record what you say.

I’ll be using AVAudioRecorder to record my voice, it’s really simple to set up, just follow the intructions on this blog by Jake Wyman. But he uses the iPhone SDK, while I will be using Cocos2D. So just follow his tutorial up to the part of adding frameworks:

From your XCode interface you are going to select the Frameworks folder, ctl>click and choose ADD and then select Existing Frameworks… Then choose both the CoreAudio.Framework and the AVFoundation.Framework

And after we have done that, some coding… Rather copy paste some code from Jake’s.

First create a NSObject class, named AudioController.

Import AVFoundation and CoreAudio:

#import <AVFoundation/AVFoundation.h>
#import <CoreAudio/CoreAudioTypes.h>

Set up AudioController as a AVAudioRecorderDelegate. And declare an AVAudioRecorder, and a string for recordedTmp (the file path where we will temporarily store our audio).

@interface AudioController : NSObject <AVAudioRecorderDelegate>

{   AVAudioRecorder * recorder;

   NSString *recordedTmpFile;


We instantiate an AVAudioSession in a function called initAudioController (basically the code inside Jake’s viewDidLoad):

– (void) initAudioController
{   //Instanciate an instance of the AVAudioSession object.
    AVAudioSession * audioSession = [AVAudioSession sharedInstance];
    //Setup the audioSession for playback and record.
    //We could just use record and then switch it to playback leter, but
    //since we are going to do both lets set it up once.
    [audioSession setCategory:AVAudioSessionCategoryPlayAndRecord error: &error];
    //Activate the session
    [audioSession setActive:YES error: &error];

And then our record function:

-(void) record
{   NSMutableDictionary* recordSetting = [[NSMutableDictionary alloc] init];
   [recordSetting setValue :[NSNumber numberWithInt:kAudioFormatAppleIMA4] forKey:AVFormatIDKey];
   [recordSetting setValue:[NSNumber numberWithFloat:44100.0] forKey:AVSampleRateKey];
   [recordSetting setValue:[NSNumber numberWithInt: 2] forKey:AVNumberOfChannelsKey];
   recordedTmpFile = [NSURL fileURLWithPath:[NSTemporaryDirectory() stringByAppendingPathComponent: [NSString stringWithString: @”recording.caf”]]];
   recorder = [[ AVAudioRecorder alloc] initWithURL:recordedTmpFile settings:recordSetting error:&error];
   [recorder setDelegate:self];
   [recorder prepareToRecord];      
   [recorder record];

And now, to play whatever recorded in a chipmunk voice. In Jake’s project he uses AVAudioPlayer to play his sound, but that isn’t going to work for me, because AVAudioPlayer doesn’t allow me to change the playback speed.

So instead of using that, I will be using CocosDenshion’s CDSoundEngine. I am reading Chapter 9 : Playing Sounds With CocosDenshion of Cocos2d for iPhone 0.99 Beginner’s Guide:

According to Pablo Ruiz, we need to import more frameworks to get CDSoundEngine working:

… include OpenAL and AudioToolbox frameworks in your project.

More imports:

#import “cocos2d.h”
#import “CocosDenshion.h”

And then declare a CDSoundEngine:

CDSoundEngine *soundEngine;

In initAudioController function, we initialize the soundEngine.

soundEngine = [[CDSoundEngine alloc] init: kAudioSessionCategory_PlayAndRecord];

NSArray *defs = [NSArray arrayWithObjects: [NSNumber numberWithInt:1],nil];
[soundEngine defineSourceGroups:defs];

And then we play:

-(void) play
{   NSString *filePath = [NSTemporaryDirectory() stringByAppendingPathComponent: [NSString stringWithString: @”recording.caf”]];

   [soundEngine loadBuffer: recordedTmpFileIdx filePath: filePath];
   [soundEngine playSound: recordedTmpFileIdx sourceGroupId: 0 pitch: 2.0f pan: 0.0f gain: 1.0f loop: NO];   

Take note of the pitch property: it says 2.0f. What does it mean? The setting for normal pitch is 1.0f, if you increase its value, you get a higher pitch, also known as a chipmunked voice, if you decrease the pitch, you get this low kind of creepy voice.

We also need to make a function for stopping the recording and then start playing:

-(void) stopRecording
{   [recorder stop];
    [self play];

And then we make a function for unloading the AudioController:

– (void) unloadAudioController
{   NSFileManager * fm = [NSFileManager defaultManager];
    [fm removeItemAtPath:[recordedTmpFiles [0] path] error:&error];
[recorder dealloc];
    recorder = nil;
   [soundEngine release];


Okay, now we have AudioController done, it’s time to call it in our HelloWorld scene. Yes. the HelloWorld that comes with the default template. I also added a BOOL isRecording to keep trakc if we are recording or playing.


#import “cocos2d.h”
#import “AudioController.h”

// HelloWorld Layer
@interface HelloWorld : CCLayer
{    AudioController *audioLayer;

     BOOL *isRecording;

+(id) scene;


For HelloWorld.m, in init, add swallowedTouches and change the “Hello World” label to “Say something…” or “Speak!” or “Talk to me” or whatever.

[[CCTouchDispatcher sharedDispatcher] addTargetedDelegate:self priority:0 swallowsTouches:YES];
CCLabel* label = [CCLabel labelWithString:@”Say something…” fontName:@”Marker Felt” fontSize:32];

Also in init, initialize audioLayer, and set isRecording to NO

audioLayer = [[AudioController alloc] init];
[audioLayer initAudioController];

isRecording = NO;

And then, since I am lazy to add buttons, the user simply taps the iPhone, anywhere on the iPhone once, to record and then tap again to stop recording and play the audio.

– (BOOL)ccTouchBegan:(UITouch *)touch withEvent:(UIEvent *)event

{    if(isRecording)

    {    [audioLayer stopRecording];

           isRecording = NO;



    {       [audioLayer record];

            isRecording = YES;



And in HelloWorld’s dealloc add:

[audioLayer unloadAudioController];

And that’s it 🙂

You can record your voice and play to sound like a chipmunk 🙂

For any questions (or if you find any errors), feel free to contact me through Twitter, here, email, Facebook or whatever.