18 thoughts on “Voice Recognition Is Flying, Needs Focus”

  1. I agree that voice is the next big thing but it needs to come alot farther than it has. I hate dealing with any customer service that has voice recognition, it just does not work good enough. Has anyone had any good experiences with it so far? Texting and IM is all fine and dandy but lets get some real world applications set up.


  2. Why would you want to talk all the time, imagine coming back tired from the office and saying “bring up the jetsons” versus clicking three buttons on your remote and bringing up the show.

    The mouse/remote/keypad wins handsdown (pun intended)

    sometimes we do not like to talk. HAL was irritating for the reason.

  3. I think voice recognition in some scenarios say looking up numbers or short text messages is a good usage scenario. similarly using it to enter addresses etc for maps, again voice can be very useful.

    Pravin, the scenario here is for the mobile phones, not for home. I guess, at home the remote control always wins.

  4. Voice recognition in phones is nothing new. I still remember the earliest Samsung cellphones, with the “Please say Name” (not a typo) voice dialing.

    While voice recognition has come far, It is still a very clunky way to deal with the system and the novelty wears off after a while. Mistakes by the system are really irritating to deal with when you are trying to get something done .. quick.

    On a lighter note, Jeremy Clarksons infamous tryst with the Mercedes S’s voice recognition should bring a chuckle.. It points out why speech recognition still has its limitations as an input method.

  5. KPN, the Dutch Telecom operator, has done many different experiments with voice recognition in live services. It is a difficult technology to make it work for millions of people. Very simple tasks can be supported, for example there is a traffic jam service that you can call and it tells you if there is a traffic jam on a road that you speak out. There is also a directory service for getting phone numbers.
    Better results are obtained in security though. Voice patterns can be used as security measures in for example banking via phone. As a stand alone technology it is still just not god enough in my opinion.

  6. Om, sincerely appreciate your comments. The broad demo was intentional in order to showcase the flexibility of the platform (dependent on the endpoint we’ll have more or less options available to the user). Another thing to remember is that mobile consumers often times do not want to switch contexts; if Yap can support 80% of what you’d normally need a mobile browser for, we’ll be successful…press a button, ask for something, boom, and then we get the heck out of your way.

    As for competition from the portals, if their strategic strength were as you portrayed it, the white label search providers (such as JumpTap and Medio) would not exist. You should hang out on the East coast more to reset your perceptions there. 😉 Our aspirations are well aligned with the carriers strategies, and we expect long and fruitful partnerships with them. Warm regards and thank you for your insightful questions at the event! i.

  7. I’ve always been a proponent of voice-commanded portable devices. If for no other reason because a non-physical interface is optimal as the size of the device shrinks to (near) zero.

    Even though I have small fingers, It has seemed apparent to me that using the keypad on a mobile phone, for instance, is just clunky. There’s only so much variety in input you actually need from a mobile appliance, yet this variety is greater than a 12-key keypad.

    The only other input that I think is worthy of a mobile device’s form factor is some kind of eyeball tracking, but we’re not there yet.

    I’m on the fence about the iphone/ipod touch model, but I can’t get over the idea that a mobile appliance’s access should optimally be limited in scope.

  8. One of the issues I remember from working with voice portals based upon natural language recognition back in 1999 where the database issues.

    Every question should get some sort of response. (For us it was called the beep problem when there was no entry in the database).

    If the playing field is pretty limited like the traffic reports that isn’t a problem, but when you’re trying to roll out numerous services the numbers of response start to add up.

    I’m currently not in that field anymore but what I remember where the amount of people training the software, so it would be usable.

  9. Great post

    The general observations that I have made on the speech industry are…

    • Initially the speech technology vendors over sold the capability. There were very few good apps, mostly because the UI paradigm was new, and businesses made an incorrect selection of applications with regards to the readiness of the technology.

    • Soon the technology got better predominantly because a) of faster processors and b)Speech vendors had real user data for training, and providers had design and deployment experience. As a result real performance results started to manifest in the contact center for customer care applications – There was focus.

    Today I believe that we are back to where we were, we have new speech technology and solutions that are emerging on mobile phones and edge devices. The prospects of this technology are very exciting. But we must learn from the past, and focus on the applications that have a high success rate given the readiness of the technology. Users don’t give new technology many chances, hence solution provides must get it right out of the gate in both accuracy and adoption rate. As an example both Yap and Vlingo have a very appealing enabling technology, but it’s the success of the initial apps that will realize the ultimate potential of these companies ( I am sure that what I am saying is not new to them).

    Lastly, some folks will like using speech user interfaces and others wont, the speech industry is betting that it will find its niche in the competing UI paradigms. Personally I believe that speech interfaces are not suited for every thing or every one but given the right solution the value delivered could be great.

  10. @pravin

    But what if you don’t know what channel the Jetsons are on (or what submenu of your VOD service)? It will take a lot more than three keystrokes to get to them. On the other hand, voice interface coupled with background intelligence would be faster and easier than just a keypad. That’s the thing about voice command; it is not enough for the system to match your speech to the correct phrase, it also has to be able to execute a meaningful series of commands based on that phrase. In the end, VUI is just like GUI, it is the last two letters that are most important.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.