Integrating voice & web: best practices

November 17, 2008

After having seen many developers and partners integrate MyVox into their applications over the past year, I think it’s fair to say there are better and worse ways to help your users understand and want to use phone interactivity in a Web product.  Here are a few lessons learned, both from observing others and from our own experiences over the past eight years:

1)  Give users context.  Say you are using the phone to do voice recording.  Most users are probably used to the idea of using a microphone to record for the Web, but not the phone.  If you don’t make it clear up front that the phone will be used, not only will you confuse some users who expected to use a mike, but – even worse – you’ll turn away users who don’t have or know how to use microphones, despite the fact that you could have serviced them wonderfully over the phone.  Tell or show users ASAP that the phone is the medium.

2)  Tell people why the phone is good for them.  Remind them that they have a phone with them all the time, that they can add your app to their address book or speeddial, that the phone makes it so easy for them to participate in your application anytime, any place, that phones record better audio than most computer mikes, and so on.  Tailor this to your application, of course.

3)  Show users what is happening during a call within the Web app.  Provide realtime updates on the call’s progress.  Not only is this informative, helping users to understand what they are doing, but it’s also something of a magical experience for new users to see something they are doing on their phone tied to the Web experience.  A good example of this principle in action is our VoxPix application.  When you take a slideshow and add voice narration to it with VoxPix, as you record voicing for each slide, you see the slide appear on the Web app, and are told you are recording for that slide.  This provides that magical fun, and also makes the process of recording audio for twelve photos much less confusing than if they were simply being told “record slide number eight after the tone” without visual feedback.

3)  Iconography is good.  You may not have much space to work with in your application, particularly if the phone is only one small piece of the puzzle, or if you are developing for a limited-space environment like Facebook.  Here, a simple icon of a phone, handset, or touchtone keypad can convey much of what you want, in very little space.

4)  Think about international callers.  Not only do they want a phone number that won’t create international long distance charges, but they also need to be addressed in their home language, so you may need multiple versions of your application (or at least multiple “voice skins”).  If you can, try offering access through VoIP providers as well.  There are a lot of Skype users out there, and it’s becoming easier and easier to offer a direct Skype address for your users to call.

5)  Customize the audio experience.  The phone is a great branding and messaging opportunity; audio and voice can convey mood in a way that’s much more difficult to pull off in text on a screen.  If you have a radio or TV presence, bring your voice talent or audio iconography to the phone prompts.  If not, consider using hired voice talent that matches your brand and target demographic.  Keep the prompts short, and make sure the voice talent enunciates.

6)  If you are using menus or speech reco, be very, very clear in your prompts and menus/grammars.  We’ve all been through touchtone hell; don’t contribute to the problem.  If you choose speech reco, either keep things extremely simple, or go ahead and hire an expert.  Otherwise, you risk users speaking words or phrases you haven’t anticipated, and creating confusion.  When in doubt, choose the lower-tech (and usually cheaper) alternative of touchtone.

7)  Provide readily available help.  This might seem like a no-brainer, but almost nobody does it (and we are guilty of this too).  A simple paragraph of mouseover help might save 10% of user experiences that will otherwise fail.  10% is a huge upside for such a minor, quick measure.


IT Expo and the appetite for voice mashups

September 25, 2008

IT Expo is not normally a conference I look forward to with any great relish.  My impression with IT Expo is that the sessions stay basically the same from year to year; the expo floor varies a bit more, but it still takes at most a couple hours or three to explore the floor and have the conversations you want to have.  When asked to speak at the conference, I was pleased, but perhaps not as excited as I might otherwise have been, having low expectations.  Was this the crowd that would see value in voice mashups?

To my great surprise, the answer was a big “yes.”  On the confference floor, I met companies like Intelepeer that are not only building their own mashups, but also working to create their own developer programs and to engage that critical Web developer in the telephony space.  Throughout the hall, I saw “API”, “Developers”, and other keywords that didn’t always mean “reaching non-telco developers”, but did at least mean that people are recognizing value in reducing the barriers to working with their products.

Better was the session itself.  I met Thomas Howe for the time, who was the other speaker at my session.  Knowing his work and focus, I knew we’d have a lot to talk about, but didn’t realize just how much unity of vision there would be.  He noted afterwards how nice it was to talk to somebody who “got it”, and I felt the same way; so much of the language he was using to describe the voice mashups space and its relevance was deeply familiar.  When you are telling a relatively new story, you can feel a little out there at times, and it’s good to have that comforting reminder that others see the opportunity as well.  After all, there’s lots of room for many people to play in this space.

The carriers are beginning to pay attention, and not just the Europeans, who were the first to recognize the importance of simplifying the telco application development process in order to expand the market for usage of their voice minutes (witness BT’s Web 21C program, and their subsequent purchase of Ribbit).  Thomas had a great presentation explaining the business case for the carriers, which you can see here; my thirty-second summary is this:

  • Carriers have excess capacity
  • Voice is a commodity, and one whose price sure isn’t going to go up
  • Voice has value in a wide variety of apps
  • To produce these apps, and drive more usage, carriers need to make it easy to build apps around their capacity, not try to write the apps themselves

It’s a problem both of expertise and personpower. So many of these apps are specific to a vertical or a brand’s needs or a specific consumer group.  There’s no way the carriers can or should develop the product expertise for each niche product.  It’s partly a long-tail problem, and partly just a matter of what your core competency is.  As for personpower, the potential marketplace for such applications is pretty tremendous.  To use a flawed analogy, you wouldn’t ask all Web development to be done by a few software houses, right?  You want to spread the work around, and take advantage of developers and entrepreneurs with a variety of skillsets and levels of experience.

My own presentation didn’t have Thomas’s great metaphors going for it, but you can see it here anyway if you’d like.  The topic is simply an explanation of voice mashups, why they matter, and what developers are looking for out of a voice mashup development environment.  A lot of people requested it after the session, so I guess it’s not bad – just a little plain-Jane.  One more thing to work on…


Phweet and consumer voice mashups

August 5, 2008

It can be a little frustrating at times to see how much of the conversation around voice mashups centers on enterprise applications. This is not because there is no place for voice mashups in the enterprise – quite the opposite. But focusing all of our collective attention on this particular kind of deployment misses the range of consumer-facing mashups that are now possible. As both technology and potential revenue streams align to make it possible for individual developers without a drop of telephony blood running through their veins to build apps that mash the phone right in with everything else, consumer-facing mashups are going to be where most of the action takes place, simply because that’s what individual developers like to build.

We have to remember that a huge part of the world of mashups (I’m really talking about Web mashups here, where we have a couple years of experience and data to draw on) is simple concepts executed in the spirit of fun and experimentation, thrown out into the consumer world, with the occasional app sticking and growing. Just look at a directory of Google Maps mashups, or those at ProgrammableWeb, to see. We saw this overemphasis on the enterprise play out again a couple of week’s ago in Dameon Welch-Abernathy’s article “Is There Money in Voice APIs” – or, more specifically, in the nice line of commentary that the article produced. You can see here how many people are thinking about voice mashups, and see value in the space, but how few are considering the consumer space. I believe this is a real mistake.

So it was a pleasant surprise this week to see the alpha launch of Phweet, a service that uses Twitter to invite friends, colleagues, and other watchers of your Twitter to join you on in-browser VoIP phone calls. While the basic principle has been seen before (Calliflower offers a related service for Facebook, oriented around conference calls, and we at MyVox built something very similar for IM-based invitations to calls a couple of years ago), Phweet may really have hit on something in the use of Twitter as the invite and notification medium du jour. They’ve had pretty good traffic in their first week for a service that’s spread through word-of-mouth, and many telephony bloggers have taken notice. And Phweet is truly a voice mashup, combining the Twitter API with TringMe (for the Flash-based VoIP piece) and Televolution’s in order to build something fundamentally new and cool – for consumers.

What’s been particularly enjoyable is seeing how conscious the Phweet guys are of how exciting it is to be able to build a service like theirs, and how much credit they give to the existence of the APIs on which they depend. And then there’s this observation from their blog:

Phweet proves both as a reference service and in terms of potential that web/voice/social-media mashups are the future…We believe that Phweet and others can sit at the intersection of the web and telephony.

Couldn’t have said it any better.


Giving everyone a mic…

July 17, 2008

As you may recall, WeMix.com uses MyVox technology to create a kind of universal access recording studio—artists use the phone as microphone to record their raps and music. The idea is taking off. Fortune wrote, “This week’s favorite is Wally J.’s ribald ‘Booty On My YouTube,’ which has been played 614,938 times.” So, it seems as though people are recording, and listening.

If this is working for the WeMix communty/ies, then what are the other contexts where this technology would be useful? Where could we use the phone as a microphone to speak to, and create, large interactive communities?


Karaoke & Nike

July 16, 2008

One of our side projects has been getting a phone-based karaoke system up and running. We haven’t been able to talk about this as we built the first version for a Nike promotion, but now that the promotion is live, we can speak about it freely.

Basically, the system lets you make a voice recording while you hear another track – presumably, a music track – coming through the earpiece on your phone. This means you can sync your vocals with the music track, recording to the beat. We then save both the vocals-only track, and a mixed-down version that combines the vocal with the instrumental.

Nike is using the system as part of a promotion around the Summer Olympics and the USA basketball team. You can call in and record a track over Just Blaze’s “Listen To The Anthem” track. If your track is selected as the winner, your vocal becomes part of the Listen To The Anthem track itself, and you get other prizes as well. Here’s the promotion page; to try the line, just call 866-936-USAB (8722).

Karaoke is not yet opened up in the MyVox API, but we may do exactly that.


Voice Maps on Google, and VoxCards

June 27, 2008

Two tidbits:

The Google Maps API folks picked up our Voice Maps app to feature on their site (scroll to the bottom right).  We were very honored that they chose to highlight us; this came out of meeting many of the API developers at Google I/O, where we demoed Voice Maps to them.  I continue to feel that Google is in a unique position to help out not only end developers, but also API services like ours, and bring us all together.

We also added a new app to our gallery:  VoxCards!  VoxCards are electronic greeting cards with a twist:  you can pick up the phone and add voice to any of them.  When the recipient opens your card, she’ll be greeted not just with the visual card, but also your voice.  Check it out!


MyVox 1.1 Launches

June 20, 2008

Yesterday saw the launch of MyVox 1.1, which added a number of new features to the product.  Perhaps the most significant is the concept of the call-initiated app.

In 1.0, every MyVox application was written with the expectation that a user would be sitting in front of some sort of screen in order to start the voice recording process.  That could be a Web page, a desktop widget, a mobile app, even an email or text message… but you needed to take some screen-based action in order to start a MyVox session, and get the phone number necessary to call and make your recording.

We realized a while ago that there was a whole class of applications that opened up if a user could start the recording interaction simply by picking up the phone.  Freed of the need for a screen-based interaction, suddenly you can have people making recordings while they are out and about, untethered.  People can call in from cars, concerts, nightclubs, on bikes, hiking trails, at museums, in classrooms… wherever.  But importantly, the recording can still be tied back to an application, which takes it and makes use of it.

1.1 makes this possible.  These “call-initiated” apps still use the same structure and API; the main differences have to do with how you set up your voice recorder (the core object of the MyVox API – each app has one), and with the assignation of a specific phone number to each application (as opposed to the traditional MyVox app, where a phone number is handed out for each recording session).

1.1 also offers…

* Identification of call-initiated users by caller ID, access code, or both, allowing (among other things) the gating of access to such apps

* One-time collection of a touchtone code at the beginning of the call, to use however you will; think zip code collection, product ID entry, etc.

* Caller ID as an attribute available on any call/recording

MyVox 1.11 will be launching very soon as well.  Keep an eye out!


Follow

Get every new post delivered to your Inbox.