Speech recognition

Share This
« Back to Glossary Index

Speech recognition is a technological advancement that allows computers to interpret and understand human speech, converting it into a format that the computer[2] can understand. This technology[1] was initially developed in the 1950s by Bell Labs with a device named Audrey, specifically designed for single-speaker digit recognition. Over the years, the technology has developed through notable milestones such as IBM’s demonstration of speech recognition at the 1962 World’s Fair, the proposal of linear predictive coding in 1966, and DARPA’s funding of Speech Understanding Research in 1971. Further advancements and methods like Hidden Markov models and deep learning techniques have significantly improved the accuracy of speech recognition. This technology is now applied in various sectors including in-car systems, education, healthcare, and government intelligence. Its primary function is to translate spoken language into written text, but it has also proven critical in diagnosing and treating speech disorders.

Terms definitions
1. technology. Technology, derived from the Greek words meaning craft and knowledge, is a broad term that refers to the tools, machines, and systems developed by humans to solve problems or fulfill objectives. Originating with primitive tools like stone axes and the discovery of fire, technology has evolved significantly throughout human history. It has been instrumental in different eras, from the invention of the wheel and advanced irrigation systems in ancient civilizations to the birth of universities and printing press during the medieval and Renaissance periods. The Industrial Revolution in the 18th century marked a significant shift in mass production and innovation, giving rise to modern technologies like electricity, automobiles, and digital communication platforms. Today, technology is integral to various aspects of life and society, driving economic growth and societal change, while also raising concerns about security, privacy, and environmental impacts. The future of technology is expected to bring even more advancements, with the rise of artificial intelligence predicted to have significant implications for the job market.
2. computer. A computer is a sophisticated device that manipulates data or information according to a set of instructions, known as programs. By design, computers can perform a wide range of tasks, from simple arithmetic calculations to complex data processing and analysis. They have evolved over the years, starting from primitive counting tools like abacus to modern digital machines. The heart of a computer is its central processing unit (CPU), which includes an arithmetic logic unit (ALU) for performing mathematical operations and registers for storing data. Computers also have memory units, like ROM and RAM, for storing information. Other components include input/output (I/O) devices that allow interaction with the machine and integrated circuits that enhance the computer's functionality. Key historical innovations, like the invention of the first programmable computer by Charles Babbage and the development of the first automatic electronic digital computer, the Atanasoff-Berry Computer (ABC), have greatly contributed to their evolution. Today, computers power the Internet, linking billions of users worldwide and have become an essential tool in almost every industry.
Speech recognition (Wikipedia)

Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers. It is also known as automatic speech recognition (ASR), computer speech recognition or speech to text (STT). It incorporates knowledge and research in the computer science, linguistics and computer engineering fields. The reverse process is speech synthesis.

Some speech recognition systems require "training" (also called "enrollment") where an individual speaker reads text or isolated vocabulary into the system. The system analyzes the person's specific voice and uses it to fine-tune the recognition of that person's speech, resulting in increased accuracy. Systems that do not use training are called "speaker-independent" systems. Systems that use training are called "speaker dependent".

Speech recognition applications include voice user interfaces such as voice dialing (e.g. "call home"), call routing (e.g. "I would like to make a collect call"), domotic appliance control, search key words (e.g. find a podcast where particular words were spoken), simple data entry (e.g., entering a credit card number), preparation of structured documents (e.g. a radiology report), determining speaker characteristics, speech-to-text processing (e.g., word processors or emails), and aircraft (usually termed direct voice input). Automatic pronunciation assessment is used in education such as for spoken language learning.

The term voice recognition or speaker identification refers to identifying the speaker, rather than what they are saying. Recognizing the speaker can simplify the task of translating speech in systems that have been trained on a specific person's voice or it can be used to authenticate or verify the identity of a speaker as part of a security process.

From the technology perspective, speech recognition has a long history with several waves of major innovations. Most recently, the field has benefited from advances in deep learning and big data. The advances are evidenced not only by the surge of academic papers published in the field, but more importantly by the worldwide industry adoption of a variety of deep learning methods in designing and deploying speech recognition systems.

« Back to Glossary Index
en_USEN
Scroll to Top