Speak a thousand tongues with one voice
Microsoft has unveiled a text-to-speech system that can speak foreign languages. So what’s new, you might ask. Well, this system goes one better and reads out the foreign-language phrases mimicking the user’s own voice.
In a recent demonstration at Microsoft’s headquarters, the voice of Rick Rashid, the firm’s chief research officer, was sampled and used to read back his phrases translated into Italian, Spanish and Mandarin.
On the website of MIT’s Technology Review, you can listen to Mr Rashid’s attempts to communicate in these languages – which presumably he does not speak himself.
I speak Italian and Spanish and can confirm that his synthesized voice speaking these two languages is comprehensible – although hardly pretty or natural sounding.
My knowledge of Mandarin is limited to ni hao, so I’ll make no attempt to comment on how easy it is to understand Mr Rashid speaking Mandarin.
But it does sound like him, which is important. Preserving a person’s voice when synthesizing their speech in another language is not just more reassuring to a user, but could also make interactions more meaningful and less mechanical, Microsoft argues.
The system needs around an hour of training to develop a model of the sounds and tones of the speaker’s voice. That model is then blended with a database of text-to-speech responses for the desired language.
At first sight, it is easy to dismiss the system as little more than a gimmick. But the system addresses a serious issue and one that presumably is experienced pretty much every day at Microsoft.
For many multinationals, particularly US-based firms, English is not just the dominant language of most of the employees but it is also the lingua franca for inter-company communications.
That makes sense and a monolingual corporate culture is not necessarily a drawback – indeed it eases communications if everyone can communicate more or less well in a common language.
But of course there are always occasions or situations where the “Speak English” rule breaks down and an English-speaking person’s foreign language skills are found seriously wanting.
The most obvious and familiar situation is travel in a strange country – I fondly remember once struggling to find a common language to enquire about bus services in rural Turkey.
Microsoft envisages its system – provisionally dubbed Monolinguist TTS — being of great aid to travelers in this situation. Combined with speech recognition engine and translation software, the system could translate a monolingual speaker’s words into one of 26 languages and read them back in the speakers’ own voice.
The new technique could also be used to help students learn a language, Microsoft believes, or packaged with satellite navigation systems to allow users to customize not just the language but also the voice that reads the directions.
Nevertheless, I see the killer application being in mobile phones. Embedded in a smartphone, the system could act as a personal interpreter capable of handling informal conversations and meetings with foreign businesspeople in out-of-the-way places when interpreters are not readily to hand but you cannot afford to let the language barrier block communications.
Microsoft has a long history of involvement in speech technologies and while it is not an obvious source of revenue for the company, it is good to see that it is still investing resources into research in this area.
This particular application was created jointly by researchers at Microsoft’s main Redmond campus and colleagues at Microsoft Research Asia, the company’s second-largest research lab, which is based in Beijing.