Do voice commands show true intelligence?
Imagine a world where you can talk to almost any electronic device in your household; you give commands to your fridge, your microwave, your car, your mobile and your TV without pressing a single button. Your voice is the sole control of the actions that you want executed by the device in question. It is a hands-free society, ruled by voice and gesture recognition. But aren’t we there already? There’s voice control technology out there that allows making simple requests like ‘call Fred.’ However, there’s one fundamental question about this technology; how intelligent are these human-machine conversations?
Thinking about it, I am not sure we can even call them conversations. According to my dictionary a conversation by definition is “the spoken exchange of thoughts, opinions, and feelings”. Giving commands to a machine hardly fits this description.
For a human-machine dialogue to even be considered as a conversation, it implies that you as a human should be able to talk to the machine in a natural way, just like you talk to any other human being, using every day language, including slang and colloquial words. In return, the machine should understand and respond with suitable answers. If it can do this, then we truly are getting close to an intelligent human-machine conversation.
What we do have is the technology of voice command which is called Automatic Speech Recognition (ASR) – also known as Computer Speech Recognition or Speech to Text – which converts spoken words to text. This is often used in applications such as voice dialing and dictation tools and increasingly in gaming and in mobile technology.
However, ASR doesn’t have humanlike intelligence. It can’t qualify a question by asking for more information. It can’t remember. It can’t search other information sources for information. In summary, it is not able to deliver intelligent solutions.
So, how do we add this intelligence to voice commands, you might ask. The answer is; by using Natural Language Interaction (NLI) technology.
With NLI technology we are able to make a machine understand questions no matter what way they are being asked. It even has the capability of remembering information and keeping it for later in a conversation. Technically speaking, what NLI does is to first analyze your query using powerful linguistic understanding libraries that understand and derive the meaning of a query. It then interprets this using advanced linguistic and business rules that simulate ‘intelligent thinking’, allowing it to reason like a human and determine the most appropriate action. Finally it performs the necessary action by giving an appropriate response.
For example, with ASR your Xbox game will understand “stop game”, but only that one command spoken in one specific way. With NLI implemented you could say “I want to stop / please, let’s quit / I don’t want to play anymore / this is boring, let’s do something else”, etc. and the program would understand all these inputs as meaning the same thing. NLI has added the intelligence which makes it possible for you to speak to your game console in exactly the same way you would talk to a friend.
To cut it short, ASR does not contain Artificial Intelligence or personality – it only recognizes voice and certain coded commands. NLI, however, does contain elements of Artificial Intelligence as well as personality, something Artificial Solutions’ Elbot is a great example of.
Elbot is a chatbot which in 2008 was crowned King of Artificial Intelligence when he won the Loebner Price. He is the only chatbot that so far has managed to convince 3 out of 12 judges that he is a human and not a machine. If Elbot had convinced one more judge, he would have passed the 30% mark – the threshold set by Alan Turing for the Turing Test in 1950 on which the Loebner Prize is based, for deciding whether a machine is capable of thinking like a human. This makes Elbot a true showcase of the capabilities that NLI technology entails.
Jumping back from NLI capabilities to ASR; recently there has been a lot of talk about the launch of voice recognition technology for television. This technology is designed to make it simple to input a variety of commands to the TV just by speaking. However, if you are looking for an intelligent chat with your TV you better wait for NLI technology to be picked up by the mainstream hardware manufacturers. Until then, voice commands will still offer a limited user experience with less intelligence than chatbots such as Elbot.