Article·AI & Engineering·Sep 27, 2024

Why Voice Technology No Longer Sucks: A Critique and History

Tife Sanusi
By Tife Sanusi
PublishedSep 27, 2024
UpdatedSep 19, 2024

Voice technology is one of the most exciting technologies that is being widely used today. Since the introduction of voice assistants into mainstream use in the 2010s, voice technology has continued to impress and assist millions of people worldwide. And now, with the addition of new technologies like deep learning and neural networks, more and more use cases for voice technology are being discovered every day.

Currently, voice technology is used by many Americans to control cars, security systems, home appliances like tvs and vacuum cleaners, and other smart devices. Companies are not left behind, using the technology for everything from automating customer service lines to providing voice-assisted alternatives to their products.

With all of the innovative and impressive tools and products that voice technology has afforded us, it can be a little difficult to imagine a time when we did not have them. Before the 2010s for example, voice assistants did not exist and every query or task had to be typed in a search engine or done by an individual. In order to fully appreciate the technology that we have now and understand what the future could look like, we have to acknowledge how much voice technology has evolved over the years. 

10 years ago, we would not have the capacity or ability to create voice technologies like OpenAI’s GPT 4o or Deepgram’s Medical Transcription Model. In this article, we’re going to explore just how much voice technology has changed throughout history and discover how the technology is being used for good.

Historical background 

The history of voice technology is a fascinating one with multiple tie-ins to pop culture phenomenons. One of the earliest attempts to create a voice technology system was at Bell Laboratories in 1957 when some scientists built a system to isolate digit recognition for a speaker.

 Bell Laboratories, notable for the development of multiple technologies including the laser and the transistor comes up quite often throughout the history of voice technology and is the site of one of those pop culture tie-ins (Abraham Weissman, a main character in the award winning show set in the 50s, The Marvelous Mrs Maisel, is a researcher at Bell Labs working on voice technology.) Bell Labs was also the site for one of the most widely known incidents in voice technology history, the recreation of the song “Daisy Bell” by an IBM 704 computer inspiring a scene in the blockbuster movie, 2001: A Space Odyssey. 

By the 1970s, research at labs like NEC Laboratories, Bell Labs and IBM Labs had resulted in progress in continuous speech recognition and template-based isolated word recognition. The Defense Advanced Research Projects Agency (DARPA) also funded multiple voice tech research resulting in many systems and technologies that we still use today

In 1975, MUSA (Multichannel Speaking Automaton), one of the first speech synthesis systems was released. The system was made up of computer hardware and specialized software. Research in the 1980s mostly involved developing systems that were able to recognize a complex group of words. One of the most notable developments was the introduction of the hidden Markov model (HMM) approach to speech recognition. By the mid-80s, every voice tech laboratory in the world was using the approach. The 80s also saw the reintroduction of neural networks into voice technology systems.

Voice technology research grew rapidly in the 1990s especially after the introduction of faster microprocessors. Later that decade, Dragon Dictate, the world’s first voice technology system targeted towards consumers was released. Dragon Dictate was a speech-to-text system for general purpose dictation and was a success winning numerous consumer awards. The 90s was also great for women researchers with Ann Srydal at AT&T Bell Laboratories creating the first female voice

The following decade saw a shift to research developing more accurate and conversational voice technology systems. This included spontaneous speech recognition systems and robust speech recognition models. In 2011, Apple released Siri, a product from one of DARPA’s research and a milestone at the time. The app was able to support basic commands and calendar appointments. Shortly after in October 2012, Google launched their upgraded Google Voice Search feature that was mea to compete with Siri.

Major milestones in the evolution of voice technology 

Voice technology has gone through a series of major milestones that completely changed how we approached and built voice tech systems. One of the first milestones was the invention of Thomas Edison’s phonograph in 1877 and the subsequent development of the Voder. The phonograph was the first device that was able to record and reproduce sound and it was a massive breakthrough at the time. In 1936, a team of researchers and engineers at Bell Labs started working on the first electronic speech synthesizer and would eventually produce the Voder in 1939. These two inventions kickstarted the early stages of voice technology research leading to the development of tools and technologies that are still used today.

Another major milestone in the development of the voice technology in use today was the introduction of voice assistants in the 2010s. In 2003, DARPA funded the Cognitive Assistant that Learns and Organizes (CALO) project in an attempt to develop the world’s first virtual assistant. Led by the Stanford Research Institute (SRI) with technology from Nuance Communications, work began on building a virtual assistant that could handle basic tasks and requests. 

Realizing the potential of their technology, some SRI researchers created their own startup, Siri, to launch the tool that they created. This startup was bought by Apple and released to the public in October 2011. The development of Siri began a race by the top tech companies to create their own virtual assistant tool ushering in a new era of voice tech research and development.

Current innovations in voice technology 

Voice technology has come a long way from the era of phonographs and vodors or the early years of virtual assistant technology. With the introduction of machine learning and natural language processing, we now have more advanced, human like technology that is able to understand, analyze and replicate the nuances and complexities of human language and interactions. Even virtual assistants like Siri, Alexa, and Google Assistant are a lot more advanced than they used to be, allowing users to carry out complex tasks and control smart devices (built with voice technology) in their homes. Interactive assistants are able to schedule and manage calls and appointments, take and transcribe notes, and place phone calls in addition to hundreds of other tasks. 

Apart from virtual assistants, voice technology is also being used in different industries to create effective and reliable alternatives to traditional work processes. For example, voice personalization based on voice tech allows clients to create tailored and customized interactions to their customers enabling them to have an immersive and engaging experience when using a product. This makes voice personalization a sought after technology for customer facing companies and startups.

 Other ways of utilizing voice technology includes using the technology to streamline customer service experiences and interactions. Voice tech has also proven useful in fraud detection, retail and ecommerce management, translations and transcriptions, and better inventory management. Deepgram’s speech-to-text model for example provides faster turnarounds with transcription speeds that are 4 to 40 times faster than other models on the market. 

The future of voice technology 

Even though voice technology is still relatively new, we already have so many applications for the technology providing simpler and more efficient ways to work and navigate daily tasks. With the continuous growth of voice technology, we can expect that the technology will only become better and more advanced resulting in more human-like experiences. According to the State of Voice Technology report by Opus Research, 54% of business leaders surveyed agreed that human-like AI voice robots are only about one to three years away. This would mean that voice robots would be indistinguishable from human voices and able to carry on intelligent conversation similar to humans. 

The advancement of voice AI technology would also result in the continued integration of voice technology in different industries. While industries like healthcare and education are already using voice tech to effectively manage and streamline client services and provide accessible services to disabled patients and customers, there are still so many more ways that voice tech could be utilized. With the use of voice technology, industries will be able to improve their efficiency and user experience. In the future, we would also have access to even more advanced speech recognition and translation technology allowing us to be able to access a broader audience, breaking down the barriers of language or accents in real time.

Conclusion

Voice technology is an evolving technology that continuously provides new and innovative ways to understand and analyze speech. From its humble beginnings that gave us the vodor and phonograph to the super accurate and human like voice robots that we have today, voice technology is constantly advancing for the better. With the context of how voice technology research began, we can see that voice technology has improved significantly over the years and will only continue to do so.

Unlock language AI at scale with an API call.

Get conversational intelligence with transcription and understanding on the world's best speech AI platform.