Whisper (système de reconnaissance vocale)

Partager
" Retour à l'index des glossaires

Whisper is a speech recognition[2] system developed by OpenAI[4]. This system utilizes the power of intelligence artificielle[1], specifically deep learning techniques, to analyze and transcribe spoken language. Built on the foundations of statistical methods and hidden Markov models, Whisper has evolved to employ more sophisticated techniques like convolutional neural networks, Seq2seq approaches, and transformer models. Trained on 680,000 hours of multilingual data through a process of semi-supervised learning, this model delivers enhanced performance across a variety of datasets. It not only reduces errors but also serves as a basis for a unified model for both speech and sound recognition. The architecture of Whisper involves segmenting input audio into 30-second chunks, converting them into Mel-frequency cepstrum, processing this data through an encoder, and then generating text captions via a decoder. Special tokens are used for tasks such as phrase-level timestamps. Overall, Whisper is a significant advancement in the realm of speech recognition technologie[3].

Définitions des termes
1. intelligence artificielle.
1 Artificial Intelligence (AI) refers to the field of computer science that aims to create systems capable of performing tasks that would normally require human intelligence. These tasks include reasoning, learning, planning, perception, and language understanding. AI draws from different fields including psychology, linguistics, philosophy, and neuroscience. The field is prominent in developing machine learning models and natural language processing systems. It also plays a significant role in creating virtual assistants and affective computing systems. AI applications extend across various sectors including healthcare, industry, government, and education. Despite its benefits, AI also raises ethical and societal concerns, necessitating regulatory policies. AI continues to evolve with advanced techniques such as deep learning and generative AI, offering new possibilities in various industries.
2 Artificial Intelligence, commonly known as AI, is a field of computer science dedicated to creating intelligent machines that perform tasks typically requiring human intellect. These tasks include problem-solving, recognizing speech, understanding natural language, and making decisions. AI is categorised into two types: narrow AI, which is designed to perform a specific task, like voice recognition, and general AI, which can perform any intellectual tasks a human being can do. It's a continuously evolving technology that draws from various fields including computer science, mathematics, psychology, linguistics, and neuroscience. The core concepts of AI include reasoning, knowledge representation, planning, natural language processing, and perception. AI has wide-ranging applications across numerous sectors, from healthcare and gaming to military and creativity, and its ethical considerations and challenges are pivotal to its development and implementation.
2. speech recognition. Speech recognition is a technological advancement that allows computers to interpret and understand human speech, converting it into a format that the computer can understand. This technology was initially developed in the 1950s by Bell Labs with a device named Audrey, specifically designed for single-speaker digit recognition. Over the years, the technology has developed through notable milestones such as IBM's demonstration of speech recognition at the 1962 World's Fair, the proposal of linear predictive coding in 1966, and DARPA's funding of Speech Understanding Research in 1971. Further advancements and methods like Hidden Markov models and deep learning techniques have significantly improved the accuracy of speech recognition. This technology is now applied in various sectors including in-car systems, education, healthcare, and government intelligence. Its primary function is to translate spoken language into written text, but it has also proven critical in diagnosing and treating speech disorders.

Whisper is a machine learning model for speech recognition et transcription, created by OpenAI and first released as open-source software in September 2022.

Whisper (système de reconnaissance vocale)
Original author(s)OpenAI
Initial releaseSeptember 21, 2022
Repositoryhttps://github.com/openai/whisper
Type

It is capable of transcribing speech in English and several other languages, and is also capable of translating several non-English languages into English. OpenAI claims that the combination of different training data used in its development has led to improved recognition of accents, background noise and jargon compared to previous approaches.

Whisper is a weakly-supervised deep learning acoustic model, made using an encoder-decoder transformer architecture.

Whisper V2 was released on December 8, 2022. Whisper V3 was released in November 2023, on the OpenAI Dev Day.

" Retour à l'index des glossaires
fr_FRFR
Retour en haut