It’s confirmed, the Machines are taking over. From ad-platforms to surveillance systems, autonomous AI systems are scanning our digital footprint to sell us everything from tennis shoes to diner and even tampering with our elections. Self-driving Ubers have already told us we may have gone too far with at least one tragedy.
Meanwhile, Google is teaming up with the US military’s Project Maven to bring AI powered drones to the battlefield. Law Prof. Ian Kerr at University of Ottawa is working hard to ensure we don’t go full Skynet with a proposed international ban on the use of AI in weapons of war.
Whether or not you like the sound of AI pouring into every aspect of our life, the question remains – how would you like your AI to sound?
Sophisticated voice activation systems paired with powerful text-to-speech engines are the latest frontier in all things digital. Today, most smartphones let us interact with virtual personal assistants that answer our questions and remind us of upcoming appointments. But the dulcet tones of Siri, Cortana and Alexa are evolving areas of surprisingly intense scrutiny that reveals insight into what our favorite tech firms believe we humans want, and what the AI wants from us.
We’ve come a long way since the creation of those annoying Interactive Voice Response (IVR) systems used by nearly every big company’s phone systems to cut the cost of customer service.
The first viable virtual assistant was launched by Apple in 2011 with Siri on the iPhone 4S. Virtual assistants and their voices have since grown to include Amazon’s Alexa, Microsoft’s Cortana alongside Google’s decidedly uncreative handle – Google Assistant. But it’s Apple that remains ahead of the crowd when it comes to voice, Siri speaks 21 different languages with male and female voice settings. All of our virtual assistants are improving every year at forming human-sounding sentences in order to make them seem ever more real. But this is only the beginning of giving a voice to our tech.
One of the leading vocal coaches working at giving voice to AI is Terry Danz, in a recent interview Danz shed some light onto what lies behind our virtual assistant’s voices. She shares her thoughts on the evolution of Siri as it was presented in iOS 9 back in 2015 through 2017’s iOS 11.
Danz says: “As the versions progress from iOS 9, the actual pitch of the voice becomes much higher and lighter. By raising the pitch, what people hear in iOS 11 is a more energized, optimistic-sounding voice. It is also a younger sound.”
“The higher pitch is less about the woman’s voice being commanding and more about creating a warmer, friendlier vocal presence that would appeal to many generations, especially millennials,” Danz continues. “With advances in technology, it is becoming easier to adapt quickly to a changing marketplace. Even a few years ago, things we now take for granted in vocal production may not have been developed, used or adopted.”
Her observations echo those of Clifford Nass and Scott Brave in their book Wired for Speech: How Voice Activates and Advances the Human–Computer Relationship. The book is a study of how we talk to machines and how we respond to their voices.
According to Nass and Brave’s research, it seems we prefer to talk to machines that are our same gender. But they concluded that both men and women are more likely to follow instructions given by a male computer voice. It seems most of us have a built-in or learned sense of male authority. But, both men and women prefer a female voice to teach us, especially in matters of relationships, but we tend to prefer a male voice when teaching us about technical subjects.
Our perception of computerized gender-voice differences is already in use by companies to voice their telephone IVR systems, using a female voice for the complaint department but a male voice for sales.
But an even more important consideration than gender or even our preferences for the pitch-perfect voice with which to interact, is our own state of mind when communicating with a virtual assistant. The next area of development for machine voicing is making the computers better at analyzing our tone when we speak back to them. Tech companies are at work building the capability to respond to us dynamically, complete with verbal cues and timing in sentences that go beyond the content itself, to express just the right emotional level for the conversation. In this way, the machines will learn to mimic the sentience of a thinking, feeling human being.
The technology is being developed in hopes of making your virtual assistant more effective at whatever job is at hand. But, are we ready for a more human sounding machine?
The goal of this AI-voice research is to optimize Alexa to sell us more stuff on Amazon, and even help Google Map’s Assistant calm us when we’re in a traffic jam.
There is a lot more research going into computerizing human voice interactions than simply finding a pleasing tone. It even raises serious ethical questions about reinforcing gender roles and of course business concerns around the development of a more human-sounding machine.