iOS Tutorial: Create iPhone video chat app using Parse and Opentok (tokbox)

Disclaimer: Creating your first iPhone video chat app isn’t a rocket science, but it is far from building a sandcastle either. As a result, this is a very lengthy tutorial. A 6000+ word mammoth, if you ask. And I saw no reason to fragment it into episodes. Serve yourself with a cup of your favorite beverage, and proceed at your own risk. (hint: Nothing beats coffee when it comes to headaches.)

Disclaimer: Are you an anxious googler who doesn’t like lengthy tutorials? Here is the entire project. (Licensing details are given at the end) But you won’t be able to run an iPhone video chat app just out of it. Go figure yourself. If you don’t want to, go on to read.

In a market flogged with social networking apps, Video chat is a killer weapon that you can wield to ace your rivals on the App store in one go. According to study, 93% of human interactions happen through visual interface.

What the hack. Need to chat face-to-face doesn’t need statistical justification. But how do they work? The question is more of design than of technical implementation in a world full of third parties. If you are scared of the word mammoth, you need to live with it because that’s the name of beast we are dealing with.

By the way, here is a warm-up snap of the end result of this iOS tutorial:

iPhone video chat

Notice the mammoth, and don’t be scared because step by step, we are going to dig its grave. And don’t pay attention to the monkey, at least for now.

The Basics of a video Chat app:

While leading video chat providers do it with their own streaming servers, cloud solutions have come to fore to assist small to medium app providers – helping them have their app “out there” quickly while taking pie off their growing revenue on per request basis. Opentok (http://www.Tokbox.com/) is one of those few pioneers that provides a nice, easy-to-integrate SDK for your next iphone app to have video chat feature.

While Opentok provides their iOS API and iOS SDK with two great examples of iOS implementation (here and here), there is still one piece left: User management. Opentok provides nice platform for video chat, but between whom?  Let’s take a deeper dive into how streaming actually works between users of services like Skype, MSN, and Yahoo video chat.

A streaming server works on fundamentals of an Http web server: In a pretty raw representation, all it recognizes is request and response, without caring who sent it. An Http server does just that. It does not worry about states, userID, password or any such access token mechanism. In other words, session management is absent.

The scenario changes when there arises a need to identify the user, authenticate him, and keep him authenticated during his entire browsing experience. A server needs to remember a user, and whatever other entities that come linked with him. These requirements become more stringent while developing a mobile software like iPhone video chat, where identities are quite crucial – they can make or break the authenticity of your app. In large web portals such as Yahoo.com or Amazon.com, dedicated authentication servers generate unique identity for users, and supply application servers with these goodies so that they can track the users until they log out.

To perform its task, a streaming server relies on only one entity to recognize who sent the request and whom to respond – this entity is session ID. It does not recognize or want to care who generated it. As far as it receives a valid sessionID, it keeps replying to streaming requests. Any user with a valid session ID is, in principle, automatically entitled to view other user’s feed who is also using the same session ID. Session IDs are of the form:

2_MX4yNjU5MzIyMn5-VHVlIEFwciAyMyAwMjoxMTo0NCBQRFQgMjAxM34wLjEzMTIwMTYyfg

You definitely know where this is going: To keep track of who wants to connect to whom in a more real-worldly way, like we do in Yahoo Messenger or Skype, we need another server that keeps track of users. User management is essential in any social networking app. It depends on you how much you want to do it – you can store big data including address, phone numbers and likes, or you can pick 3-4 fields of your choice to make it easier on the user. But the fact is you need to do it. Any software that relies on user interaction cannot do without it – and so is our iPhone video chat.

To handle user management, there are number of cloud solutions available again. For the purpose of this tutorial, I have chosen Parse.com because it boasts of 60k+ apps live, claims easiest of the APIs, and those claims, as I have experienced, are right!

Some Trivia: As I am typing this, Parse.com has been acquired by Facebook. Irrespective of whether this is a good omen, I am continuing to scribble this.

The objectives of this tutorial, therefore, are to show:

  • How to enable Parse.com users see each other (Your favorite messenger’s who’s online sort)

  • How to make them talk with each other using Opentok video streaming feature in your next great iphone video chat app

Disclaimer: I do not work for Tokbox. Neither for Parse.com (or, for that matter facebook!). Then why am I writing this tutorial? There already exists a nice attempt to explain the same concept: Broadcast tutorial by Tokbox itself.

Need for writing an entire iPhone video chat tutorial this long felt like an overkill first. But it sprung from the fact that the Broadcast tutorial talks about broadcasts, where only one party’s feed is viewable to others. Two-way chat is the growing need of time. Also, looking at the Broadcast first time, I felt the learning curve was quite steep for a video chat novice programmer. I strongly felt I could simplify many concepts explained there.

My take? If you can’t understand it, explain it. Let’s go.

Setup (Parse.com and Tokbox.com) for your iPhone video chat app – LiveSessions:

My showcase app – Livesessions – requires some configuration on both Tokbox.com and Parse.com profiles – the server side. Parse.com needs your app, and so does Tokbox.com, although it names it a Project.

I assume you know enough to configure an app on Parse.com. In addition to it, you also need 2 data tables – ActiveUsers and ActiveSessions, though there is no need to pre-populate them at any point. Here is a snap of what their column lists will look like:

ActiveUsers ActiveSessions
userID – String
userLocation – Geopoint
userTitle – String
callerID  – String
callerTitle – String
receiverID – String
sessionID – String
publisherToken – String
subscriberToken – String
isVideo – Boolean
isAudio – Boolean

Similarly, on Tokbox.com, once you login, go to dashboard and create new project, it will automatically create an API key and API secret. Note that these are your credential as a Tokbox developer, not a chat user. API key is more like your user ID and API secret is a password. The resulting Tokbox dashboard screen will look like this (the real API key and text are hidden):

iPhone video chat

Tired? There is still one piece left to be done. As I discussed at length earlier, we need to connect our streaming application server (Tokbox.com) with authentication server (Parse.com). Unless this is done, streaming server has no way to know which user is calling whom because all it knows about is session ID. Our iPhone video chat user, on the other hand, only knows about his own user credentials supplied to him by Parse.com.

As it is apparent, it is Parse.com’s job to connect the two. That is:

  • To intercept logged in user’s request to chat (the caller).
  • To convert it into Tokbox session ID, get the session ID and other necessary information (the token) from Tokbox.
  • To Inform the caller and receiver (Parse.com users) about the same.

Since both users now have a session ID and a token which is received from Parse.com, they can seamlessly communicate via Tokbox streaming server.

If you think about it again, this is not very much unlike the token guy at your nearby bank. Once he hands you a token with a number on it, the public address system connects you with the next available teller counter, thus removing the token guy from the workflow. However depending on the functionality we may need our token guy (Parse.com) around a bit more time. In fact, we will need him even after the chat ends, and we will quickly see why. But for now, lets see what Parse.com does as the token guy.

chat-arch

To accomplish above, Parse.com must obtain sessionID, publisherToken and subscriberToken from tokbox, and that is where Parse cloud code comes into picture. In Cloud code section, you must upload some code so that the resulting screen looks like this:

iPhone video chat

The process of uploading (oops, deploying) cloud code on Parse.com is described in this source tutorial (Setting Up section) and here, and it is far better than I could explain here. So let’s skip it to maintain the scope. However for simplicity’s sake, I have included it step by step in readme.txt that comes within the code.

But I would take a few moments to explain what this code does, and why. The iPhone app user initiates a call to another user. No, he does not make a phone call. Remember, it’s LiveSession app’s job to handle the entire call, just like Skype or Yahoo messenger does. What our iPhone app needs to do under the hood is fairly simple task: it is saving a row to ActiveSessions Parse table we just created. There, it stores caller user ID (callerID) and receiver user ID (receiverID) among other things.

Now what this cloud code does is something quite magical, yet simple. It intercepts the Save operation in its beforeSave cloud trigger – nicely elaborated here. From within beforesave, Opentok javascript API takes over. Opentok supplied function createSession generates a session ID. Another function, Opentok.generateToken, creates a publisher token or subscriber token, depending on the role argument passed, which decides whether you want to publish your own video feed (Opentok.ROLE.PUBLISHER) or see other user’s video feed (Opentok.ROLE.SUBSCRIBER) using that session ID. (It is unclear from my experience if there is any difference between the two. For the scope of this discussion let us generate both as we need two-way video feeds anyway.)

Since we must remember we are within beforeSave trigger, we already have the handle to the object being saved – ActiveSessions. By simply setting its respective members – we can save three distinct items to our Parse.com database: sessionID, publisherToken and subscriberToken – all three columns in ActiveSession table.

Well – that’s for that. What? You are done with the back end of your first iPhone video chat app! Congratulations!

But how? All we required for two user’s to connect via streaming server is a user id (session id) and a password (token) – and we have both. Now all we have to do is – avail them to iOS client via Parse.com. Not that it’s quite easy as 1-2-3, but the mammoth has bitten the dust already.

Serve yourself a cup of hot coffee as you wait to see one of your friends on your iPhone video app screen! Well, not quite quickly, but before you jump in, you can do them some favor: read this disclaimer if it can help any of your Android friends:

Disclaimer: The server setup that you have done on Parse.com isn’t restricted to iOS apps alone. You can use same cloud code and table structure for your Android based chat apps too, using Parse.com’s Android SDK.

iOS client – the other part of the deal:

Now that the back end is accomplished, let’s focus on how iOS client – our own LiveSession iPhone app – keeps its part of the deal. The major tasks we aim to cover are:

1) Initiate iPhone video chat call – by saving an ActiveSession row (we already described the server part of it above as part of cloud code)

2) Handle Incoming Call – check Parse.com database for an incoming video call – described down the line.

Let’s tackle each of them – one by one. But first and foremost, let’s setup the basic iOS project and go over what’s all needed.

iPhone Video Chat Initial Project Setup:

Open XCode, setup a single View application with default options. Name it LiveSessions. In storyboards, set up 2 scenes: One for users list – named LSViewController (derived from UIViewController), and another for hosting video chat view, LSStreamingViewController (again, subclass of UIViewController).

Next, add a UITableView object to LSViewController scene through storyboard. This table view must maintain a list of Parse.com users who log into LiveSession app any time. LSViewController will handle all the chores related to UITableView, nothing unusual. Do not forget to implement UITableViewDelegate and UITableViewDatasource protocols in LSViewController so as to handle table view related stuff.

In addition to the above, we also need a helper class called ParseHelper which wraps our calls to Parse. All of its members and functions would be static.

To use Parse and Opentok framework, we need to link them, as well as some of the required frameworks used by both of them.

Parse framework can be obtained from here.

Opentok SDK can be obtained from here. This link also explains at length what all you need to do in order to link Opentok framework successfully.

Next, you should add some libraries to LiveSessions by selecting it, going to Build Phases->Link Binary section. After adding number of frameworks, this section should look like this:

At the end of linking everything, your Project tree should look like this:

Project-tree

(Notice the Opentok.bundle thing that come as part of Opentok sdk. Also see three other libraries that go below it in order to link everything together).

Before we proceed, let’s have one look at how beautiful (!) our storyboard looks:

Step 1 – Initiate Video Call:

Parse.com acts as a mediator between caller, Opentok streaming engine and receiver. The first and foremost requirement is to generate sessionID, publisherToken and subscriberToken, so that both clients can seamlessly connect via Opentok once session is established. We already know the Parse.com cloud code does that. But how will LiveSessions app invoke the cloud code?

The following code not only stores an ActiveSessions object to Parse, but also invokes cloud code (beforeSave trigger) we discussed above that generates sessionID, publisherToken and subscriberToken – and they are eventually stored into ActiveSessions table itself.

//ParseHelper.m
//will initiate the call by saving session
//if there is a session already existing, do not save,
//just pop an alert
+(void)saveSessionToParse:(NSDictionary *)inputDict
{    
    NSString * receiverID = [inputDict objectForKey:@"receiverID"];

    //check if the recipient is either the caller or receiver in one of the activesessions.
    NSPredicate *predicate = [NSPredicate predicateWithFormat:
                              @"receiverID = '%@' OR callerID = %@", receiverID, receiverID];
    PFQuery *query = [PFQuery queryWithClassName:@"ActiveSessions" predicate:predicate];

    [query getFirstObjectInBackgroundWithBlock:^
    (PFObject *object, NSError *error)
    {
        if (!object)
        {
            NSLog(@"No session with receiverID exists.");
            [self storeToParse:inputDict];
        }
        else
        {
           [[NSNotificationCenter defaultCenter] postNotification:[NSNotification notificationWithName:kReceiverBusyNotification object:nil]];
           return;
        }
    }];
}

+(void) storeToParse:(NSDictionary *)inputDict
{
    __block PFObject *activeSession = [PFObject objectWithClassName:@"ActiveSessions"];
    NSString * callerID = [inputDict objectForKey:@"callerID"];
    if (callerID)
    {
        [activeSession setObject:callerID forKey:@"callerID"];
    }
    bool bAudio = [[inputDict objectForKey:@"isAudio"]boolValue];
    [activeSession setObject:[NSNumber numberWithBool:bAudio] forKey:@"isAudio"];

    bool bVideo = [[inputDict objectForKey:@"isVideo"]boolValue];
    [activeSession setObject:[NSNumber numberWithBool:bVideo] forKey:@"isVideo"];

    NSString * receiverID = [inputDict objectForKey:@"receiverID"];
    if (receiverID)
    {
        [activeSession setObject:receiverID forKey:@"receiverID"];
    }

    //callerTitle
    NSString * callerTitle = [inputDict objectForKey:@"callerTitle"];
    if (receiverID)
    {
        [activeSession setObject:callerTitle forKey:@"callerTitle"];
    }

    [activeSession saveInBackgroundWithBlock:^(BOOL succeeded, NSError* error)
    {
        if (!error)
        {
             NSLog(@"sessionID: %@, publisherToken: %@ , subscriberToken: %@", activeSession[@"sessionID"],activeSession[@"publisherToken"],
                   activeSession[@"subscriberToken"]);

             LSAppDelegate * appDelegate = [[UIApplication sharedApplication] delegate];
             appDelegate.sessionID = activeSession[@"sessionID"];
             appDelegate.subscriberToken = activeSession[@"subscriberToken"];
             appDelegate.publisherToken = activeSession[@"publisherToken"];
             appDelegate.callerTitle = activeSession[@"callerTitle"];
             [[NSNotificationCenter defaultCenter] postNotification:[NSNotification notificationWithName:kSessionSavedNotification object:nil]];
         }
         else
         {
             NSLog(@"savesession error!!! %@", [error localizedDescription]);
             NSString * msg = [NSString stringWithFormat:@"Failed to save outgoing call session. Please try again.  %@", [error localizedDescription]];
             [self showAlert:msg];
         }         
     }];
}

At the end of executing the above code, we should have a sessionID, publisherToken as well as subscriberToken in our Parse.com ActiveSessions table. Alright, but who will execute it? Lot of stuff still remain unanswered – for example, from where does all the argument values (receiverID, callerID) come from? We deliberately missed that part, because establishing the session was most important. The callerID, receiverID parameters that we used above are eventually just the user IDs generated by Parse.com PFUser object. You can have your own way of registering and authenticating a user. In LiveSessions, we just store each user within ActiveUsers table, and only using a user title of his / her own choice. No emails, passwords or verification. And here is code that is responsible for it:

//ParseHelper.m
+(void) showUserTitlePrompt
{
    UIAlertView *userNameAlert = [[UIAlertView alloc] initWithTitle:@"LiveSessions" message:@"Enter your name:" delegate:self cancelButtonTitle:nil otherButtonTitles:@"OK", nil];
    userNameAlert.alertViewStyle = UIAlertViewStylePlainTextInput;
    userNameAlert.tag = kUIAlertViewTagUserName;
    [userNameAlert show];
}

+(void) anonymousLogin
{
    loggedInUser = [PFUser currentUser];
    if (loggedInUser)
    {
        [self showUserTitlePrompt];       
        return;
    }

    [PFAnonymousUtils logInWithBlock:^(PFUser *user, NSError *error)
     {
         if (error)
         {
             NSLog(@"Anonymous login failed.%@", [error localizedDescription]);
             NSString * msg = [NSString stringWithFormat:@"Failed to login anonymously. Please try again.  %@", [error localizedDescription]];
             [self showAlert:msg];
         }
         else
         {            
             loggedInUser = [PFUser user];
             loggedInUser = user;
             [self showUserTitlePrompt];
         }
     }];
}

What this does is simple: when the app launches, check for the locally stored Parse user ([PFUser currentUser]), and if one does not exist, perform anonymous login, which will create a PFUser object on Parse.com Users table. What is important to us is loggedInUser static object that we use to store currently logged on user. At the end of successful login, showUserTitlePrompt function seems to prompt the user to enter a title of his / her choice.

Fine, but what happens when user enters it? Well, significant number things. For a start, here is how LiveSessions handles it:

//ParseHelper.m
+ (void)alertView:(UIAlertView *)alertView clickedButtonAtIndex:(NSInteger)buttonIndex
{
    if (kUIAlertViewTagUserName == alertView.tag)
    {
        //lets differe saving title till we have the location.
        //saveuserwithlocationtoparse will handle it.
        LSAppDelegate * appDelegate = [[UIApplication sharedApplication] delegate];
        appDelegate.userTitle = [[alertView textFieldAtIndex:0].text copy];
        appDelegate.bFullyLoggedIn = YES;

        //fire appdelegate timer
        [appDelegate fireListeningTimer];
        [[NSNotificationCenter defaultCenter] postNotification:[NSNotification notificationWithName:kLoggedInNotification object:nil]];
    }
    else if (kUIAlertViewTagIncomingCall == alertView.tag)
    {
        if (buttonIndex != [alertView cancelButtonIndex])   //accept the call
        {
            //accept the call
            [[NSNotificationCenter defaultCenter] postNotification:[NSNotification notificationWithName:kIncomingCallNotification object:nil]];
        }
        else
        {
            //user did not accept call, restart timer          
            //start polling for new call.
            [self setPollingTimer:YES];
        }
    }
}

Notice the part under tag kUIAlertViewTagUserName. This code tells LiveSessions that user is now fully logged in, along with an identification (title) of his / her choice. This title will be eventually stored into ActiveUsers table as userTitle, but with one more thing: user’s current location. Yes, LiveSessions is a location-aware app. And to obtain user’s location,  ParseHelper.m posts a kLoggedInNotification notification to LSViewController. LSViewController has CLLocationManager code inside it which will track user’s current location. At the end, once we have everything, the entire user (his title, user ID and location) are saved into ActiveUsers table.

Here is what goes inside LSViewController to obtain user’s current location, and call to Parse wrapper for storing it to ActiveUsers table:

//LSViewController.m
//Called in response of kLoggedInNotification 
- (void) didLogin
{
   [self startUpdate];
}

#pragma location methods
//this will invoke locationManager to track user's current location
- (void)startUpdate
{
    if (locationManager)
    {
        [locationManager stopUpdatingLocation];
    }
    else
    {
        locationManager = [[CLLocationManager alloc] init];
        [locationManager setDelegate:self];
        [locationManager setDesiredAccuracy:kCLLocationAccuracyBestForNavigation];
        [locationManager setDistanceFilter:30.0];
    }

    [locationManager startUpdatingLocation];
}

//stop tracking location
- (void)stopUpdate
{
    if (locationManager)
    {
        [locationManager stopUpdatingLocation];
    }
}

//this will store finalized user location. 
//once done, it will save it in ActiveUsers row and then fetch nearer users to show in table.
- (void)locationManager:(CLLocationManager *)manager
    didUpdateToLocation:(CLLocation *)newLocation
           fromLocation:(CLLocation *)oldLocation
{  

    CLLocationDistance meters = [newLocation distanceFromLocation:oldLocation];
    //discard if inaccurate, or if user hasn't moved much.
    if (meters != -1 && meters < 50.0)
        return;

    NSLog(@"## Latitude  : %f", newLocation.coordinate.latitude);
    NSLog(@"## Longitude : %f", newLocation.coordinate.longitude);

    appDelegate.currentLocation = newLocation;

    //pause the updates, until didUserLocSaved is called
    //via kUserLocSavedNotification notification, to avoid multiple saves.
    [self stopUpdate];

    PFUser * thisUser = [ParseHelper loggedInUser] ;

    [ParseHelper saveUserWithLocationToParse:thisUser :[PFGeoPoint geoPointWithLocation:appDelegate.currentLocation]];
    [self fireNearUsersQuery:RANGE_IN_MILES :appDelegate.currentLocation.coordinate :YES];
}

The first unknown in above code so far is call to fireNearUsersQuery function,which serves front end. We will come to it later. The other unknown is saveUserWithLocationToParse function, which will fill the gaps left so far to complete the back end. It belongs to ParseHelper.m, and here it goes – there is nothing unusual about storing it, and it acts as our own little user repository. The generated user’s object ID is stored for later use inside activeUserObjectID.

//ParseHelper.m
+ (void) saveUserWithLocationToParse:(PFUser*) user :(PFGeoPoint *) geopoint
{
    __block PFObject *activeUser;

    PFQuery *query = [PFQuery queryWithClassName:@"ActiveUsers"];
    [query whereKey:@"userID" equalTo:user.objectId];
    [query findObjectsInBackgroundWithBlock:^(NSArray *objects, NSError *error)
    {
        if (!error)
        {
            // if user is active user already, just update the entry
            // otherwise create it.
            if (objects.count == 0)
            {
                activeUser = [PFObject objectWithClassName:@"ActiveUsers"];
            }
            else
            {                
                activeUser = (PFObject *)[objects objectAtIndex:0];
            }
            LSAppDelegate * appDelegate = [[UIApplication sharedApplication] delegate];
            [activeUser setObject:user.objectId forKey:@"userID"];
            [activeUser setObject:geopoint forKey:@"userLocation"];
            [activeUser setObject:appDelegate.userTitle forKey:@"userTitle"];
            [activeUser saveInBackgroundWithBlock:^(BOOL succeeded, NSError *error)
            {
                if (error)
                {
                    NSString * errordesc = [NSString stringWithFormat:@"Save to ActiveUsers failed.%@", [error localizedDescription]];
                    [self showAlert:errordesc];
                    NSLog(@"%@", errordesc);
                }
                else
                {
                    NSLog(@"Save to ActiveUsers succeeded.");
                    activeUserObjectID = activeUser.objectId;

                    NSLog(@"%@", activeUserObjectID);
                }
                [[NSNotificationCenter defaultCenter] postNotification:[NSNotification notificationWithName:kUserLocSavedNotification object:nil]];
            }];
        }
        else
        {
            NSString * msg = [NSString stringWithFormat:@"Failed to save updated location. Please try again.  %@", [error localizedDescription]];
            [self showAlert:msg];
        }
    }];
}

The code so far ensured a user is saved inside ActiveUsers table. We also saw how he / she can initiate a video call to another user, by creating an ActiveSessions object. But whom does the user chat with?

We must also present a list of users to logged on user to chat with – equivalent of Yahoo/Skype friend’s list. Sending friend requests through email or any other means would be quite an overkill for our tutorial’s scope. To keep things minimal, we don’t even ask our users to enter their email ID for registration.

Instead, we have chosen a unique way to test out video chat feature: show list of users who are geographically within specified radii – say 200 miles. Parse.com already has PFGeopoint related query mechanism which makes our task easier.

The other unknown in code above, fireNearUsersQuery goes as below, and it fills up the datasource for the LSViewController  table view – an NSMutableArray made of dictionaries filled with user’s titles:

//LSViewController.m
//this method polls for new users that gets added / removed from surrounding region.
//distanceinMiles - range in Miles
//bRefreshUI - whether to refresh table UI
//argCoord - location around which to execute the search.
-(void) fireNearUsersQuery : (CLLocationDistance) distanceinMiles :(CLLocationCoordinate2D)argCoord :(bool)bRefreshUI
{
    CGFloat miles = distanceinMiles;
    NSLog(@"fireNearUsersQuery %f",miles);

    PFQuery *query = [PFQuery queryWithClassName:@"ActiveUsers"];
    [query setLimit:1000];
    [query whereKey:@"userLocation"
       nearGeoPoint:
     [PFGeoPoint geoPointWithLatitude:argCoord.latitude longitude:argCoord.longitude] withinMiles:miles];    

    //delete all existing rows,first from front end, then from data source. 
    [m_userArray removeAllObjects];
    [m_userTableView reloadData];    

    [query findObjectsInBackgroundWithBlock:^(NSArray *objects, NSError *error)
    {
        if (!error)
        {
            for (PFObject *object in objects)
            {
                //if for this user, skip it.
                NSString *userID = [object valueForKey:@"userID"];
                NSString *currentuser = [ParseHelper loggedInUser].objectId;
                NSLog(@"%@",userID);
                NSLog(@"%@",currentuser);

                if ([userID isEqualToString:currentuser])
                {
                    NSLog(@"skipping - current user");
                    continue;
                }

                NSString *userTitle = [object valueForKey:@"userTitle"];

                NSMutableDictionary * dict = [NSMutableDictionary dictionary];
                [dict setObject:userID forKey:@"userID"];
                [dict setObject:userTitle forKey:@"userTitle"];

                // TODO: if reverse-geocoder is added, userLocation can be converted to
                // meaningful placemark info and user's address can be shown in table view.
                // [dict setObject:userTitle forKey:@"userLocation"];
                [m_userArray addObject:dict];
            }

            //when done, refresh the table view
            if (bRefreshUI)
            {
                [m_userTableView reloadData];
            }
        }
        else
        {
            NSLog(@"%@",[error description]);
        }
    }];
}

The result of fireNearUsersQuery call will be somewhat like below, where 3 nearby users (<200 miles radii) are visible for chat:

iPhone video chat

Inside LSViewController, the m_userTableView gets populated from m_userArray. Each row in the table view has a green Call button. When you tap that button, call is initiated for that user as the receiver ID. What call? The code we just covered to store the session: saveSessionToParse. Who calls it? Well, now it’s time the video chat scene (LSStreamingViewController) takes charge. Before proceeding, take a look at this activity flow – the big picture. You will come back to it quite often as you read on:

iPhone Video chat

Upon tapping of the green phone call button, a segue is performed to transition to LSStreamingViewController. Inside LSStreamingViewController, [ParsHelper saveSessionToParse] is called. Here is that part:

//LSViewController.m
- (void) startVideoChat:(id) sender
{
    UIButton * button = (UIButton *)sender;

    if (button.tag < 0) //out of bounds
    {
        [ParseHelper showAlert:@"User is no longer online."];
        return;
    }

    NSMutableDictionary * dict = [m_userArray objectAtIndex:button.tag];
    NSString * receiverID = [dict objectForKey:@"userID"];
    m_receiverID = [receiverID copy];
    [self goToStreamingVC];
}

- (void) goToStreamingVC
{
    //[self presentModalViewController:streamingVC animated:YES];
    //
    [self performSegueWithIdentifier:@"StreamingSegue" sender:self];
}

-(void) prepareForSegue:(UIStoryboardPopoverSegue *)segue sender:(id)sender
{
    if ([segue.identifier isEqualToString:@"StreamingSegue"])
    {     
        UINavigationController * navcontroller =  (UINavigationController *) segue.destinationViewController;        
        LSStreamingViewController * streamingVC =  (LSStreamingViewController *)navcontroller.topViewController;        
        streamingVC.callReceiverID = [m_receiverID copy];    
        if (bAudioOnly)
        {
            streamingVC.bAudio = YES;
            streamingVC.bVideo = NO;
        }
        else
        {
            streamingVC.bAudio = YES;
            streamingVC.bVideo = YES;
        }
    }
}

Once inside LSStreamingViewController:

//LSStreamingViewController.m
- (void) viewDidAppear:(BOOL)animated
{
    if (![self.callReceiverID isEqualToString:@""])
    {
        m_mode = streamingModeOutgoing; //generate session
        [self initOutGoingCall];
        //connect, publish/subscriber -> will be taken care by
        //sessionSaved observer handler.
    }
    else
    {
        m_mode = streamingModeIncoming; //connect, publish, subscribe
        m_connectionAttempts = 1;
        [self connectWithPublisherToken];
    }
}

- (void) initOutGoingCall
{
    NSMutableDictionary * inputDict = [NSMutableDictionary dictionary];
    [inputDict setObject:[ParseHelper loggedInUser].objectId forKey:@"callerID"];
    [inputDict setObject:appDelegate.userTitle forKey:@"callerTitle"];
    [inputDict setObject:self.callReceiverID forKey:@"receiverID"];
    [inputDict setObject:[NSNumber numberWithBool:self.bAudio] forKey:@"isAudio"];
    [inputDict setObject:[NSNumber numberWithBool:self.bVideo] forKey:@"isVideo"];
    m_connectionAttempts = 1;
    [ParseHelper saveSessionToParse:inputDict];
}

As a matter of its duty, LSStreamingViewController handles both outgoing and incoming calls. To differentiate the two, it uses receiver ID (self.callreceiverID): For outgoing calls, it has a value supplied from LSViewController (see the segue transition code). For incoming calls, there is no need for it so it is null or empty.

As soon as saveSessionToParse saves ActiveSessions object to Parse.com database, it notifies LSStreamingViewController so that sessionID, publisherToken and subscriberToken values from Opentok (that became available to app’s delegate) can be usable to LSStreamingViewController. This notification (kSessionSavedNotification) is handled by sessionSaved like this:

//LSStreamingViewController.m
- (void) sessionSaved
{
    [self connectWithSubscriberToken];
}

In forthcoming section we will see how the above call makes video chat fully seamless between two users, without Parse intervention.

Huh..the mammoth has been laid to rest, but there is still life in it. We already covered session generation part. But how does the other user know about it? And when exactly Opentok takes the charge to start the exciting video?

Step 2 – Handle Incoming Call:

Handling of an incoming call is tricky bit. Let’s list out the bare minimum necessities:

  • You need to poll the database for a session destined to you (logged on user) – that is – search for an ActiveSessions record where current user is listed as receiver.
  • You need to ensure that database is up-to-date once the call has been established – that is, remove the session row once sessionID and tokens have been read up into iPhone app
  • You also need to signal interruptions while a session is ON – that is, inform the caller gracefully that the receiver is busy on another call. For simplicity’s sake, we aren’t handling multi-user calls (conference) right now, although it can be handled quite easily.

Recall that in alertView:(UIAlertView *)alertView clickedButtonAtIndex, we saw a call to [appDelegate fireListeningTimer],and now it is time to expand it, because it accomplishes our first task of the three listed above: It fires a timer that continually polls Parse.com ActiveSessions table for calls destined to current user.

//LSAppDelegate.m
//this method will be called once logged in. It will poll parse ActiveSessions object
//for incoming calls.
-(void) fireListeningTimer
{
    if (self.appTimer && [self.appTimer isValid])
        return;

    self.appTimer = [NSTimer scheduledTimerWithTimeInterval:8.0
                                                     target:self
                                                   selector:@selector(onTick:)
                                                   userInfo:nil
                                                    repeats:YES];
    [ParseHelper setPollingTimer:YES];  
    NSLog(@"fired timer");
}

-(void)onTick:(NSTimer *)timer
{
    NSLog(@"OnTick");
    [ParseHelper pollParseForActiveSessions];  
}

As it is named, [ParseHelper pollParseForActiveSessions] will poll ActiveSessions table for sessions calling out to this user – that is, rows which have receiverID = currently logged on user’s object ID.

 //ParseHelper.m
 //poll parse ActiveSessions object for incoming calls.
 +(void) pollParseForActiveSessions
 {
     __block PFObject *activeSession;

     if (!bPollingTimerOn)
         return;

     PFQuery *query = [PFQuery queryWithClassName:@"ActiveSessions"];

     NSString* currentUserID = [self loggedInUser].objectId;
     [query whereKey:@"receiverID" equalTo:currentUserID];  

     [query findObjectsInBackgroundWithBlock:^(NSArray *objects, NSError *error)
      {
          if (!error)
          {
              // if user is active user already, just update the entry
              // otherwise create it.
              LSAppDelegate * appDelegate = [[UIApplication sharedApplication] delegate];

              if (objects.count == 0)
              {

              }
              else
              {
                  activeSession = (PFObject *)[objects objectAtIndex:0];                 
                  appDelegate.sessionID = activeSession[@"sessionID"];
                  appDelegate.subscriberToken = activeSession[@"subscriberToken"];
                  appDelegate.publisherToken = activeSession[@"publisherToken"];
                  appDelegate.callerTitle = activeSession[@"callerTitle"];
                 // future use:
                  //appDelegate.bAudioCallOnly = !([activeSession[@"isVideo"] boolValue]);

                  //done with backend object, remove it.
                  [self setPollingTimer:NO];
                  [self deleteActiveSession];

                  NSString *msg = [NSString stringWithFormat:@"Incoming Call from %@, Accept?", appDelegate.callerTitle];                  
                  UIAlertView *incomingCallAlert = [[UIAlertView alloc] initWithTitle:@"LiveSessions" message:msg delegate:self cancelButtonTitle:@"No" otherButtonTitles:@"Yes", nil];                 
                  incomingCallAlert.tag = kUIAlertViewTagIncomingCall;
                  [incomingCallAlert show];                 
              }
          }
          else
          {
              NSString * msg = [NSString stringWithFormat:@"Failed to retrieve active session for incoming call. Please try again. %@", [error localizedDescription]];
              [self showAlert:msg];
          }
     }];
}

The method is quite self-explanatory – whenever it finds an ActiveSessions object, it just copies all the fields it needs – sessionID, publisherToken, and subscriberToken into app delegate’s properties. Once done, it deletes it from Parse.com backend using [self deleteActiveSession] call. [self setPollingTimer:NO] is to keep things in sync: it ensures that timer doesn’t fire up another polling query through pollParseForActiveSessions after an object has been found and deletion is in progress using [self deleteActiveSession].

Once the ActiveSession values are copied to App’s delegate, more important stuff is waiting: user needs to be notified of an incoming call. incomingCallAlert performs this task, and here is the result:

 iPhone video chat

What’s more important is incomingCallAlert’s delegate, which we already visited in Step 1 – let’s go over it again:

//ParseHelper.m
+ (void)alertView:(UIAlertView *)alertView clickedButtonAtIndex:(NSInteger)buttonIndex
{
    if (kUIAlertViewTagUserName == alertView.tag)
    {
        //lets differ saving title till we have the location.
        //saveuserwithlocationtoparse will handle it.
        LSAppDelegate * appDelegate = [[UIApplication sharedApplication] delegate];
        appDelegate.userTitle = [[alertView textFieldAtIndex:0].text copy];
        appDelegate.bFullyLoggedIn = YES;

        //fire appdelegate timer
        [appDelegate fireListeningTimer];
        [[NSNotificationCenter defaultCenter] postNotification:[NSNotification notificationWithName:kLoggedInNotification object:nil]];
    }
    else if (kUIAlertViewTagIncomingCall == alertView.tag)
    {
        if (buttonIndex != [alertView cancelButtonIndex])   //accept the call
        {
            //accept the call
            [[NSNotificationCenter defaultCenter] postNotification:[NSNotification notificationWithName:kIncomingCallNotification object:nil]];
        }
        else
        {
            //user did not accept call, restart timer 
            //start polling for new call.
            [self setPollingTimer:YES];
        }
    }
}

If user had not accepted the call, the polling timer flag is set and app starts to look for new incoming call session. If user rather decides to accept the call,  kIncomingCallNotification is posted, and it is responsible for notifying LSViewController that a call has arrived. Rest of the stuff happens under LSViewController, which in turn hands it over to LSStreamingViewController.

//if and when a call arrives- 
(void) didCallArrive
{    
     //pass blank because call has arrived, no need for receiverID.
     m_receiverID = @"";
     [self goToStreamingVC];
}

didCallArrive fires in response to kIncomingCallNotification, and all it does it empty the m_receiverID to indicate that call is destined to self – an incoming call. This, as we already saw in prepareForSegue – is enough to signal LSStreamingViewController that call is supposed to be handled as incoming call – so no new ActiveSessions object need to be stored. All that is left is to utilize the sessionID and token values to connect to Tokbox streaming server.

So far, we discussed both cases – in both we obtained sessionID, publisherToken and subscriberToken into our app’s delegate. We finally passed the control over to LSStreamingViewController. In case of an outgoing call, [LSStreamingViewController sessionSaved] function calls [LSStreamingViewController connectWithSubscriberToken]. In case of incoming call, as we already saw in [LSStreamingViewController viewDidAppear], a call is made to [LSStreamingViewController connectWithPublisherToken].

//LSStreamingViewController.m
- (void) connectWithPublisherToken
{
    NSLog(@"connectWithPublisherToken");
    [self doConnect:appDelegate.publisherToken :appDelegate.sessionID];
}

- (void) connectWithSubscriberToken
{
    NSLog(@"connectWithSubscriberToken");    
    [self doConnect:appDelegate.subscriberToken :appDelegate.sessionID];
}

- (void)doConnect : (NSString *) token :(NSString *) sessionID
{
    _session = [[OTSession alloc] initWithSessionId:sessionID
                                           delegate:self];
    [_session addObserver:self forKeyPath:@"connectionCount"
                  options:NSKeyValueObservingOptionNew
                  context:nil];
    [_session connectWithApiKey:kApiKey token:token];
}

The only difference between two of them is the token they use – and Opentok isn’t quite clear about what changes if you use one token instead of the other (publisher token or subscriber token – as you remember it is generated from cloud code in beforeSave trigger). Irrespective of which one you use to connect to a session, it allows you to publish your stream (your camera feed) as well as subscribe to other user’s stream.

Once [_session connectWithApiKey] call is made, Opentok takes over.  If you recall the analogy, the teller counter finally puts the token guy out of the way, and start serving its customer. All you need to remember is that your streaming view controller (LSStreamingViewController) must implement these protocols: OTSessionDelegate, OTSubscriberDelegate, OTPublisherDelegate. See the activity flow diagram up again – there, iOS app initiated actions are listed in yellow, and delegates are marked in green. These delegate functions are part of these three protocols that LSStreamingViewController must implement. As they are called by Opentok along the flow, you need to take various actions to make that enticing video available to your user.

Now it no longer matters whether you are a caller or a receiver as far as you implement necessary delegate methods from Opentok. The Broadcast tutorial from Opentok has all implementation details, and I have followed it bit by bit, apart from my own UI modifications. For example, if you choose to view your own stream as soon as session gets connected, following code accomplishes it:

//LSStreamingViewController.m
- (void)sessionDidConnect:(OTSession*)session
{ 
    NSLog(@"sessionDidConnect: %@", session.sessionId);
    NSLog(@"- connectionId: %@", session.connection.connectionId);
    NSLog(@"- creationTime: %@", session.connection.creationTime);
    [self.disconnectButton setHidden:NO];
    [self.view bringSubviewToFront:self.disconnectButton];

    self._statusLabel.text = @"Connected, waiting for stream...";  
    [self.view bringSubviewToFront:self._statusLabel];

    [self doPublish];
}

- (void)doPublish
{
    _publisher = [[OTPublisher alloc] initWithDelegate:self name:UIDevice.currentDevice.name];
    _publisher.publishAudio = self.bAudio;
    _publisher.publishVideo = self.bVideo;
    [_session publish:_publisher];

    //symmetry is beauty.
    float x = 5.0;
    float y = 5.0;
    float publisherWidth = 120.0;
    float publisherHeight = 120.0;

    [_publisher.view setFrame:CGRectMake(x,y,publisherWidth,publisherHeight)];
    [self.view addSubview:_publisher.view];
    [self.view bringSubviewToFront:self.disconnectButton];
    [self.view bringSubviewToFront:self._statusLabel];

    NSLog(@"%f-%f-%f-%f", _publisher.view.frame.origin.x, _publisher.view.frame.origin.y, _publisher.view.frame.size.width, _publisher.view.frame.size.height);

    _publisher.view.layer.cornerRadius = 10.0;
    _publisher.view.layer.masksToBounds = YES;
    _publisher.view.layer.borderWidth = 5.0;
    _publisher.view.layer.borderColor = [UIColor yellowColor].CGColor;
}

In the code above, [_session publish] call prompts the user to allow his / her own camera feed, and as soon as he / she allows it, LiveSessions start publishing the camera feed to Opentok streaming server. A crucial piece to remember here is the call to following:

_publisher.view setFrame

SDK is designed such that without this, you never get to see your own feed. And no, any indirect method (e.g. addSubView to a container view) to set the frame doesn’t work. At the same time you can decorate your publisher view. For example, we have changed features like border color and corner radius.

Seeing the feed of the other user in the same session is somewhat that doesn’t fall into any order. All you need to do is implement necessary delegates so that as soon as you start receiving that feed, you get an opportunity to configure it fully – like this:

//LSStreamingViewController.m
- (void)subscriberDidConnectToStream:(OTSubscriber*)subscriber
{
    NSLog(@"subscriberDidConnectToStream (%@)", subscriber.stream.connection.connectionId);

    float subscriberWidth = [[UIScreen mainScreen] bounds].size.width;
    float subscriberHeight = [[UIScreen mainScreen] bounds].size.height - self.navigationController.navigationBar.frame.size.height;

    NSLog(@"screenheight %f", [[UIScreen mainScreen] bounds].size.height);
    NSLog(@"navheight %f", self.navigationController.navigationBar.frame.size.height);

    //fill up entire screen except navbar.
    [subscriber.view setFrame:CGRectMake(0, 0, subscriberWidth, subscriberHeight)];

    [self.view addSubview:subscriber.view];
    self.disconnectButton.hidden = NO;

    if (_publisher)
    {
        [self.view bringSubviewToFront:_publisher.view];
        [self.view bringSubviewToFront:self.disconnectButton];
        [self.view bringSubviewToFront:self._statusLabel];
    }
    subscriber.view.layer.cornerRadius = 10.0;
    subscriber.view.layer.masksToBounds = YES;
    subscriber.view.layer.borderWidth = 5.0;
    subscriber.view.layer.borderColor = [UIColor lightGrayColor].CGColor;

    self._statusLabel.text = @"Connected and streaming...";
    [self.view bringSubviewToFront:self._statusLabel];
}

subscriberDidConnectToStream delegate allows you to configure your own subscriber view. Again, setFrame statement is crucial and if you don’t include it or do it with wrong values, you may never get to see other user’s feed – something that can break your (and your users’) heart! Again, you can do your own UI modifications such as reporting the current status (using _statusLabel) and decorating the subscriber view inside the same delegate.

There are plenty of other delegates that Opentok sdk provides that you can use to include various features to smarten your app. For example, see how I chose to handle subscriber didFailWithError delegate:

//LSStreamingViewController.m
- (void)subscriber:(OTSubscriber *)subscriber didFailWithError:(OTError *)error
{
    NSLog(@"subscriber: %@ didFailWithError: ", subscriber.stream.streamId);
    NSLog(@"- code: %d", error.code);
    NSLog(@"- description: %@", error.localizedDescription);
    self._statusLabel.text = @"Error receiving video feed, disconnecting...";
    [self.view bringSubviewToFront:self._statusLabel];
    [self performSelector:@selector(doneStreaming:) withObject:nil afterDelay:5.0];
}

- (IBAction)doneStreaming:(id)sender
{
    [self disConnectAndGoBack];
}

- (void) disConnectAndGoBack
{
    [self doUnpublish];
    [self doDisconnect];
    self.disconnectButton.hidden = YES;
    [ParseHelper deleteActiveSession];

    //set the polling on.
    [ParseHelper setPollingTimer:YES];
    [self dismissModalViewControllerAnimated:YES];
}

There is much more inside LSStreamingViewController that needs little or no explanation for someone who knows UIKit well. So we proudly declare that the mammoth may have stopped breathing -you can see it yourself:

iPhone video chat

By the way, who is that insane soul screaming in the publisher view? LiveSessions isn’t that smart – what it shows there is what your iPhone’s camera sees!

And yeah, if you didn’t notice, the sleeping beast is an elephant, not a mammoth. Mammoths existed only in ice age. So is this real?

Who knows, it’s just virtual. But so are nerds trolling in chat rooms.

All we need now is to discard the remnants to smoother the flow of our iPhone video chat – let’s do them in one go.

The Cleanup:

As a VOIP app, LiveSessions must either keep running in the background or do its part of cleanup as soon as it enters background. To preserve simplicity, I chose later. There are some rules laid by Apple to perform any task in background – be it trivial or not. Within LSAppDelegate.m, we cleanup our back end following those rules.

- (void)applicationDidEnterBackground:(UIApplication *)application
{    
    backgroundTask = [application beginBackgroundTaskWithExpirationHandler:^{

        // Clean up any unfinished task business by marking where you        
        // stopped or ending the task outright.        
        [application endBackgroundTask:backgroundTask];        
        backgroundTask = UIBackgroundTaskInvalid;        
    }];

    // Start the long-running task and return immediately.    
    dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0),
    ^{
        // Do the work associated with the task, preferably in chunks.
        [ParseHelper deleteActiveSession];
        [ParseHelper deleteActiveUser];
        [application endBackgroundTask:backgroundTask];        
        backgroundTask = UIBackgroundTaskInvalid;        
    });    
}

And we also wake up following those rules:

- (void)applicationWillEnterForeground:(UIApplication *)application
{
    // Called as part of the transition from the background to the inactive state; here you can undo many of the changes made on entering the background.
    self.bFullyLoggedIn = NO;
    [ParseHelper initData];
    [ParseHelper anonymousLogin];    
}

And here goes what we call in the above code:

+ (void) deleteActiveSession
{
    NSLog(@"deleteActiveSession");
    LSAppDelegate * appDelegate = [[UIApplication sharedApplication] delegate];
    NSString * activeSessionID = appDelegate.sessionID;

    if (!activeSessionID || [activeSessionID isEqualToString:@""])
        return;

    PFQuery *query = [PFQuery queryWithClassName:@"ActiveSessions"];
    [query whereKey:@"sessionID" equalTo:appDelegate.sessionID];

    [query getFirstObjectInBackgroundWithBlock:^(PFObject *object, NSError *error)
    {
        if (!object)
        {
            NSLog(@"No session exists.");     
        }
        else
        {
            // The find succeeded.
            NSLog(@"Successfully retrieved the object.");
            [object deleteInBackgroundWithBlock:^(BOOL succeeded, NSError *error)
            {
                if (succeeded && !error)
                {
                    NSLog(@"Session deleted from parse");                   
                }
                else
                {
                    //[self showAlert:[error description]];
                    NSLog(@"%@", [error description]);
                }
            }];
        }
    }];
}

+ (void) deleteActiveUser
{
    NSString * activeUserobjID = [self activeUserObjectID];
    if (!activeUserobjID || [activeUserobjID isEqualToString:@""])
        return;

    PFQuery *query = [PFQuery queryWithClassName:@"ActiveUsers"];
    [query whereKey:@"userID" equalTo:activeUserobjID];

    [query getFirstObjectInBackgroundWithBlock:^(PFObject *object, NSError *error)
    {
        if (!object)
        {
            NSLog(@"No such users exists.");
        }
        else
        {
            // The find succeeded.
            NSLog(@"Successfully retrieved the ActiveUser.");
            [object deleteInBackgroundWithBlock:^(BOOL succeeded, NSError *error)
             {
                 if (succeeded && !error)
                 {
                     NSLog(@"User deleted from parse");
                     activeUserObjectID = nil;
                 }
                 else
                 {
                     //[self showAlert:[error description]];
                      NSLog(@"%@", [error description]);
                 }
             }];
        }
    }];
}

+(void) initData
{
    if (!objectsUnderDeletionQueue)
        objectsUnderDeletionQueue = [NSMutableArray array];
}

+ (bool) isUnderDeletion : (id) argObjectID
{
    return [objectsUnderDeletionQueue containsObject:argObjectID];
}

Both delete functions do what is expected – they delete ActiveUsers and ActiveSessions object from Parse.com database. There is nothing unusual that they do. objectsUnderDeletion array is our way of keeping things in sync: when Parse.com is busy deleting stuff in background, it prevents our app from repeatedly firing delete commands.

Sign off and Giving Ins:

This tutorial and the code that comes with can serve as bare backbone to your next cutting edge messenger App. You can do your own customizations for Parse user management or UI layout, and you can use it freely to learn, teach or sell – with only obligation of citing it back to myself, which you must do, and which will make me extremely happy.

Keep chatting…

Comments are closed.