For a long time, it was only in science-fiction literature and movies where you could find mentions of voice recognition systems. But look around. People are talking to machines today, and they obey humans’ commands. What writers once considered fantasy has become real. Voice clients are already installed in mobile devices, computers, TVs, washing machines, elevators and cars. If you are interested in learning more about the story behind maintaining human-machine communication, then read on.
Progress demands
We are witnessing a growing tendency in embedding voice recognition systems in all industries – from tablets to cars. Such a trend could be explained by our hi-tech crazed modern society. The trend is also encouraged by media, cinematography, marketing, TV and the Internet.
But if you look deeper, you can find the roots of humans’ aspirations to start talking to machines in their anthropology. Applying voice recognition systems is natural for people, and whatever feels natural makes the processes of achieving these things simpler. Historically, mankind used to exchange information through interpersonal communication, where verbal communication was it. Before the time when people were writing letters, texting and sending messages by e-mail, the way they exchanged information was via verbal communication only.
The main advantage of voice recognition systems is that the user doesn’t need to develop any new skills to manage the program. In comparison, consider how when becoming a computer user a person should learn computer literacy – how to hold the mouse and how to type on the keyboard. When managing the machine by his voice, however, the person doesn’t need any special skill to pronounce the command. Another important aspect is that many tasks could be completed just with a help of human voice. Then the voice recognition system is applying itself to many components of the system, while the person doesn’t need to switch different interfaces.
Limits stimulate development
The target audience of voice recognition systems is quite diverse. There are groups that are in urgent need of having such an option: people suffering from disabilities and people driving their cars, for example. For many disabled people, voice recognition software is the only way for them to interact with the outside world independently. As for car drivers, they have been pushed to start using voice recognition systems. With security requirements getting stricter all over the world, car manufacturers have started installing voice systems to avoid the official ban on talking while driving.
As a result of these demands, most mobile manufacturers are also developing and implementing voice recognition clients. While the concept of the system is similar, the quality of products varies. The voice client quality is becoming a real advantage for potential phone buyers, since they consider it a serious issue when choosing a new device.
Decoding the voice
It’s important to consider that testing voice recognition clients differs from any other type of testing. Unlike testing a regular mobile application, the tester has an endless scope of data that could be entered. If you want a good client, you do not limit a person to just 10 words the system recognizes. Modern voice recognition clients should decode as many commands as possible, presenting developers and testers with a very challenging task. But still, even the best voice recognition system doesn’t guarantee correct decoding 100% of the time. The job of the developers and testers is to make the correct decoding percentage as high as possible.
If we talk about voice typesetting, the stage where mistakes are most likely to occur is the decoding stage. Let’s look at the architectural level of the process, when sounds captured by the microphone are going through frequency analysis. First, the voice is converted into graphical wave motion; then it is transformed into characters that build the word. The searching tool in Google Chrome mobile version, and predictive typing on mobile phones and tablets are good examples of this process.
However, it gets more complex when you deal with multi-functional applications, where the voice recognition system consists of two stages. First, the client decodes the voice and forms the whole phrase. Second, a complex algorithm switches on and starts analyzing each word separately and the whole phrase together. This is where the biggest amount of mistakes occur. Those voice recognition systems are pretty bulky, so they are installed on servers, while the mobile device has just a small client to record the voice, send it to servers and receive the commands back to perform them. To optimize testing and the fixing of bugs on the server and the client, mistakes should be strictly differentiated.
Testing the client: with female voice and in the pub
The way we speak and the way we pronounce words – these are the types of factors that have an impact on voice recognition systems. The voice pitch and timbre could be recognized by the system differently. Also, every person has his own voice speed. This should be taken into consideration while working on choosing testing scenarios. It is recommended to choose a quality assurance engineer with average pitch, timbre and voice speed. Ideally, the same functions are tested with both male and female voices. In testing a client for a foreign language, it’s good to have a tester able to speak without an accent, so you don’t end up like the guys in this video clip.
The tester should forecast different environments; it’s not enough to make a test just for a quiet room. Noisy streets, crowded pubs and public transport – the voice client should be adjusted to decode the human voice anywhere.
What else can undermine voice client performance? If technical support, such as headsets, Bluetooth and other accessories don’t function correctly, the client can fail in accomplishing the task at hand. The need for an instant and reliable connection challenges developers and testers to diminish the impact of Internet connection quality. It also helps if the tester emulates other user scenarios, such as playing music on his phone, incoming calls and other interruptions.
It’s not so easy to imitate a user while testing voice recognition clients. However, this is the very case when the “do like a user does” approach is the key to success. An experienced tester can think of many users’ testing approaches to ensure high quality of the final product.
Currently at its peak of popularity, voice clients still have a huge niche in which to be developed and adopted. This gives developers a lot of room in improving current software and creating new ones. At the same time, it is a great responsibility to be involved in this process. Every tester should keep in mind the millions of people using the voice recognition software they have tested and improved. Using correct approaches and optimal strategies in testing will allow every user to be satisfied with the communications channel you have enabled for them.
Nadia Knysh is head of the Agile QA department for A1QA, with seven years of leadership experience. She is responsible for Agile methodology development within the company. Her solid technical and management background helps her to correctly allocate the team’s workflow. Knysh is a Certified ScrumMaster, and she holds a master’s degree in software engineering and an ISTQB certificate.
Editor’s Note: In an attempt to broaden our interaction with our readers we have created this Reader Forum for those with something meaningful to say to the wireless industry. We want to keep this as open as possible, but we maintain some editorial control to keep it free of commercials or attacks. Please send along submissions for this section to our editors at: dmeyer@rcrwireless.com.
Photo copyright: / 123RF Stock Photo