Introduction to voice technologies
Voice interface (VUI – voice-user interface) is a software product designed to free your hands and eyes, to simplify entering or receiving information.
The voice is a natural communication tool. Many people want to resolve issues orally rather than in writing simply because it is faster. Business communication with customers is a convenient and native way of interaction. But only some companies can expand the call center staff in proportion to the growth rate of the customer base. Automation is becoming an effective way to scale live communication with customers. It allows you to keep the usual methods of communication and cover a more significant number of contacts without sacrificing quality.
Voice technologies are used in many areas and are suitable for any audience: children are attracted by an interactive “talker,” and young people appreciate the voice control of smart devices. An assistant reads the news to the elderly. But voice assistants are most in demand in those industries where there are a lot of point communications with customers – in finance, retail, and telecom.
Interestingly, Generation Z (those born in the early 2000s) and early Alpha Generations (children born after the 2010s) are the last to type using a keyboard. Marketingland columnist and marketer Andrew Ruegger is sure of this. He wrote about this in his column. The next generation will be represented exclusively by users of voice commands. “Keys like “OK Google” are becoming more common in search query reports. And we are even seeing their growth in Google Trends,” the expert writes.
Voice interfaces are penetrating deeper into business processes, and today it’s not the robot’s voice in the handset that can surprise the user, but its successful joke or apt phrase. With the help of voice assistants, businesses optimize and speed up customer interaction, but many users feel an aversion to “soulless interlocutors.”
That is why digital technologies try to listen to a person’s commands to determine emotions in the face, thereby using a natural way of interacting with users.
The largest companies have been using voice technology for years. All five Big Five tech companies — Microsoft, Google, Amazon, Apple, and Facebook — have developed AI voice assistants. Since 2017, Bank of America has been running Erica, a virtual assistant. Since 2018, Mercedes-Benz has introduced a digital User Experience (MBUX) complex that understands voice commands. Retailer Walmart has launched an application with the Ask Sam voice assistant, which helps customers with product searches. According to Adobe Analytics, 91% of brands already invest heavily in voice solutions and plan to increase investment.
What characterizes the voice interface, and how does it differ from the usual visual one?
Experts from the Nielsen Norman Group identified five essential voice user interface technologies:
- Voice Input: Prompts are spoken by voice rather than entered through the keyboard or GUI graphical elements.
- Natural language: Users do not have to be limited to using a specific, computer-optimized vocabulary or syntax but can structure the input in any way as if it were a human conversation.
- Voice output: information is spoken by voice, not displayed on the screen.
- Intelligent Interpretation: To truly understand the user’s queries, the GUI must use additional information, such as the context of use or actions the user has previously taken.
- Facilitate: The GUI performs actions necessary to complete the user’s task that the user did not request.
Examples of industries where voice technologies perform actual tasks
- Retail
Retailers and e-commerce companies actively use voice assistants and smart speakers. In the US, 45% of residents aged 22-37 are already buying through voice assistants, according to the CouponFollow Millennial Shopping 2019 report. And OC&C analysts predict that up to 6% of all online purchases will be made using this tool in the country in two years. Today, some North American retailers use voice technology to automate the buying process fully. For example, Walmart has integrated Siri and Google Assistant, with which you can place an order by paying for it with a linked card.
Why is voice technology so attractive in online shopping? The secret is simple: they make it more comfortable for customers. You can, for example, ask a search engine the question “Where to buy jeans?” and quickly get a response without wasting time typing.
On a global scale, voice assistants are gradually becoming an integral part of the team in those retail niches characterized by fast reusable orders. Trendsetters include industry leaders such as Amazon and McDonald’s. Highly specialized places (sales of engineering equipment, etc.) will remain on the technological periphery for a long time. They have not yet reached online sales, not to mention voice assistants, which, objectively, they do not yet need (voice queries “where to buy an engineering machine” are much less common than “where to buy sneakers”).
- Automatic calls
One of the most prominent applications of voice technology. The client needs to press a button, and he can immediately begin communicating with an automated assistant. Thus, companies remove the annoying “hanging on the line” phase from communication while waiting for a response from a live consultant. Automating call centers and developing teams opens up much more significant opportunities for businesses than delegating tasks to communicate with a client. Voice assistants are a powerful tool for obtaining data, monitoring and improving the quality of service, optimizing processes, and monitoring compliance with corporate standards.
- Voice assistants in medicine
Voice interfaces become personal doctors with broad expertise – their “memory card” is practically inexhaustible. Intelligent assistants not only advise but also prescribe treatment. At the same time, robots rarely forget or lose sight of something in the analysis. An example is Triad Health AI’s medical system, which uses Google Home and Amazon Alexa to treat Parkinson’s disease.
How voice technology is implemented in contemporary mobile app development
Voice technology is becoming increasingly popular in mobile app development, as it allows users to interact with their devices more naturally and intuitively. There are several different ways that voice technology can be implemented in mobile apps, but some of the most common include the following:
- Voice commands: This feature allows users to control the app or perform specific actions using voice commands. For example, users could say “take a photo” to launch the camera app and take a picture.
- Voice recognition: This feature allows the app to recognize spoken words or phrases and respond accordingly. For example, a voice-enabled keyboard could remember when a user says “new line” and insert a line break in the text they are typing.
- Speech-to-text: This feature allows the app to transcribe spoken words into written text. This can be useful in many different apps, such as note-taking or dictation apps.
- Text-to-speech: This feature allows the app to convert written text into spoken words, which can be helpful for accessibility or for apps that want to read content to the user.
Various frameworks and libraries are also available for implementing voice technology in mobile apps, such as Apple’s SiriKit and Google’s Voice Actions API for Android. These frameworks provide pre-built functionality for everyday tasks such as sending messages, scheduling appointments, and making phone calls, saving developers a lot of time and effort.
There are also many third-party Speech Recognition APIs available, like Google Cloud Speech-to-Text API, Amazon Transcribe, and Microsoft Azure Speech Services. These services provide pre-trained models and easy-to-use APIs to get speech-to-text, text-to-speech, and other features.
Finally, it’s important to note that, when implementing voice technology in a mobile app, it’s essential to keep in mind the user experience and consider things like the accuracy of the speech recognition, the response time of the app, and the overall user flow.
Examples of voice technologies used in mobile apps
Many examples of voice technology are currently being used in mobile apps and other technology. Here are a few examples:
- Virtual personal assistants: One of the most famous examples of voice technology is virtual personal assistants, such as Apple’s Siri, Amazon’s Alexa, and Google Assistant. These assistants can set reminders, play music, and provide weather and news updates.
- Voice search: Many mobile apps and websites now include a voice search feature, which allows users to search for content using their voice rather than typing. This feature is handy on mobile devices, where typing on a tiny keyboard can be tedious.
- Voice control of smart home devices: Voice technology can also control smart home devices, such as lights and thermostats. For example, users can say “turn on the living room lights” to control lights in their home via voice.
- Voice-enabled chatbots: Some businesses are now using voice-enabled chatbots to help customers with their inquiries. For example, a bank might use a chatbot to assist customers with account information and transactions via voice commands.
- Voice-controlled gaming: Voice technology is also used to allow players to interact with games using their voice. This can be used for things like giving commands to in-game characters, dictating in-game chat, or even controlling game elements using voice commands.
- In-car systems: Automobile manufacturers are incorporating Voice technology to control various systems in the car, such as AC, Navigation, music, and some advanced functions in the car.
- Voice dictation in typing or note-taking apps
These are just a few examples of how voice technology is being used today, and as the technology continues to improve, we can expect to see more and more innovative uses of voice technology in the future.
Main trends of voice interfaces
Although voice interfaces are quite well developed today, the technology has not yet reached its limit. In the coming years, it will develop in several directions.
- For success and visible results, bots need emotions. That is why many developers are putting their efforts into bringing the synthesized voice closer to the human one. In the future, robots will be able to speak with a local accent and insert linking words, interjections, and even slang.
- Visual development. Voice assistants will soon gain a face, which will expand the scope of their application. For example, in HR, they will be able to conduct interviews and analyze the veracity of information. Offline retail will launch bots into salesrooms in the form of interactive screens that, like consultants, will assess needs and help customers make a choice.
- The bot does not always cope with the answers in the dialogue with users. This is where a man comes to the rescue. On the one hand, customer questions are not left unattended. On the other hand, the bot learns and becomes smarter thanks to the response templates that a live manager leaves. It is also a great way to develop technology and increase consumer loyalty.
- Now voice assistants distinguish emotions of a small spectrum – from joy to anger, they can identify the gender of the interlocutor. But in some areas, there is a need to recognize the maximum range of experiences. Today, developers are working to create more “responsive” home gadgets so that they can adjust to the mood of the owner/client.
- Cross-platform. There are many individual products on the market, each with its advantages and disadvantages. Obviously, their combination in one device will expand the possibilities of voice control.
Although voice assistants are just starting to enter a phase of active growth, one thing is clear: they will definitely surprise us in the future. With improvements in speech recognition technology and the integration of bots with visual technologies, we will see a new era of automation, where voice assistants will acquire both a human voice and a face.
Conclusion
In the near future, voice interaction will become more widespread in almost all areas of activity. Devices capable of recognizing voice and generating it are rapidly becoming cheaper with the development of voice assistants and the ubiquity of the Internet.