Speech to Computer Interface: Are “Robo-Butlers” Our Future?
Computer-dictating systems have been around for years, but until recently they were notorious – and often laughably – inaccurate, imprecise, and prone to errors.
These days more people are interacting with technology using the spoken word. The newest generation of voice-enabled devices makes dictating memos, e-mails, and text messages reliable enough to be practical, and voice assistants, such as Apple’s Siri, are available on most smartphones. Voice-driven apps can control smart appliances, furnish directions, answer questions, and give instructions.
But the voice is not ready to replace other forms of the computer interface (i.e., keyboards and touchscreens), at least not for a while.
“Deep learning” is the technology that enables voice recognition software to recognize and decipher human speech. It “teaches” computers to identify and interpret human speech using complex algorithms. Using deep learning, machines are now able to transcribe more accurately and sound more natural, less robotic.
However, despite their deep-learning algorithms, computers can’t carry on coherent conversations. They mostly can’t comprehend the nuances of languages or the context in which words are spoken, and usually only respond to simple, one-off voice commands.
Mark Zuckerberg’s “Robo-butler,” Jarvis, may be as close as we now get to a bot that can interpret and respond to human speech; but according to his Facebook post, even Zuckerberg himself currently prefers texting Jarvis to giving it voice commands.
Still, in many situations, speaking would be more convenient, safer, and natural than other means of communication. You can talk while driving, working out, jogging, shopping, or doing chores. As well, the voice interface is already extending the power of computing to people who are unable, for whatever reason, to use screens and keyboards.
It’s just not quite ready for prime time. Yet.